org.apache.uima.lucas.indexer.analysis
Class AnnotationTokenStream

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.uima.lucas.indexer.analysis.AnnotationTokenStream

public class AnnotationTokenStream
extends org.apache.lucene.analysis.TokenStream

AnnotationTokenStream represents a TokenStream which extracts tokens from feature values of annotations of a given type from a JCas object. Each token has the start and end offset from the annotation object. This class supports only the following UIMA JCas types of features:

  1. String
  2. StringArray
  3. FSArray
  4. Number types


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
 
Constructor Summary
AnnotationTokenStream(org.apache.uima.jcas.JCas jCas, String sofaName, String typeName)
          Creates a TokenStream which extracts all coveredText feature values of annotations of a given type from a JCas object.
AnnotationTokenStream(org.apache.uima.jcas.JCas jCas, String sofaName, String typeName, List<String> featureNames, Map<String,Format> featureFormats)
          Creates a TokenStream which extracts all feature values of a given feature name list from annotations with a given type from a given JCas object.
AnnotationTokenStream(org.apache.uima.jcas.JCas jCas, String sofaName, String typeName, List<String> featureNames, String delimiter, Map<String,Format> featureFormats)
          Creates a TokenStream which extracts all feature values of a given feature name list from annotations with a given type from a given JCas object.
AnnotationTokenStream(org.apache.uima.jcas.JCas jCas, String sofaName, String typeName, String featureName, Format featureFormat)
          Creates a TokenStream which extracts all feature values of a given feature name from annotations with a given type from a given JCas object.
AnnotationTokenStream(org.apache.uima.jcas.JCas jCas, String sofaName, String typeName, String featurePath, List<String> featureNames, Map<String,Format> featureFormats)
          Creates a TokenStream which extracts all feature values of a given feature name list from annotations with a given type from a given JCas object.
AnnotationTokenStream(org.apache.uima.jcas.JCas jCas, String sofaName, String typeName, String featurePath, List<String> featureNames, String delimiter, Map<String,Format> featureFormats)
          Creates a TokenStream which extracts all feature values of a given feature name list from annotations with a given type from a given JCas object.
 
Method Summary
protected  Iterator<org.apache.uima.cas.FeatureStructure> createFeatureStructureIterator(org.apache.uima.jcas.tcas.Annotation annotation, String featurePath)
           
protected  Iterator<String> createFeatureValueIterator(org.apache.uima.cas.FeatureStructure srcFeatureStructure, Collection<String> featureNames)
           
 org.apache.uima.cas.Type getAnnotationType()
           
 String getDelimiter()
           
 Map<String,Format> getFeatureFormats()
           
 List<String> getFeatureNames()
           
 String getFeaturePath()
           
 org.apache.uima.jcas.JCas getJCas()
           
 String getValueForFeature(org.apache.uima.cas.FeatureStructure featureStructure, org.apache.uima.cas.Feature feature, Format format)
           
protected  void initializeIterators()
           
 org.apache.lucene.analysis.Token next()
           
 org.apache.lucene.analysis.Token next(org.apache.lucene.analysis.Token token)
           
 void reset()
           
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
close, end, getOnlyUseNewAPI, incrementToken, setOnlyUseNewAPI
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

AnnotationTokenStream

public AnnotationTokenStream(org.apache.uima.jcas.JCas jCas,
                             String sofaName,
                             String typeName)
                      throws InvalidTokenSourceException
Creates a TokenStream which extracts all coveredText feature values of annotations of a given type from a JCas object. Each token has the start and end offset of the annotation and takes the covered text value as termText.

Parameters:
jCas - the jCas
sofaName - the name of the subject of analysis (sofa)
typeName - the type of the annotation
Throws:
org.apache.uima.cas.CASException
InvalidTokenSourceException

AnnotationTokenStream

public AnnotationTokenStream(org.apache.uima.jcas.JCas jCas,
                             String sofaName,
                             String typeName,
                             String featureName,
                             Format featureFormat)
                      throws InvalidTokenSourceException
Creates a TokenStream which extracts all feature values of a given feature name from annotations with a given type from a given JCas object. Each token has the start and end offset of the annotation and uses the feature value as term text.

Parameters:
jCas - the JCas object
sofaName - the name of the subject of analysis (sofa)
typeName - the type of the annotation
featureName - the name of the feature from which the token text is build
featureFormat - optional format object to convert feature values to strings
Throws:
InvalidTokenSourceException

AnnotationTokenStream

public AnnotationTokenStream(org.apache.uima.jcas.JCas jCas,
                             String sofaName,
                             String typeName,
                             List<String> featureNames,
                             String delimiter,
                             Map<String,Format> featureFormats)
                      throws InvalidTokenSourceException
Creates a TokenStream which extracts all feature values of a given feature name list from annotations with a given type from a given JCas object. Each token has the start and end offset of the annotation and uses the concatenation of all the feature values as term text. Optionally the different feature values of an annotation can be concatenated with a delimiter.

Parameters:
jCas - the JCas object
sofaName - the name of the Subject Of Analysis (sofa)
typeName - the type of the annotation
featureNames - the name of the feature from which the token text is build
delimiter - a delimiter for concatenating the different feature values of an annotation object. If null a white space will be used.
featureFormats - optional map of format objects to convert feature values to strings - the key must be the feature name
Throws:
InvalidTokenSourceException

AnnotationTokenStream

public AnnotationTokenStream(org.apache.uima.jcas.JCas jCas,
                             String sofaName,
                             String typeName,
                             List<String> featureNames,
                             Map<String,Format> featureFormats)
                      throws InvalidTokenSourceException
Creates a TokenStream which extracts all feature values of a given feature name list from annotations with a given type from a given JCas object. Each token has the start and end offset of the annotation and uses the concatenation of all the feature values as term text.

Parameters:
jCas - the JCas object
sofaName - the name of the Subject Of Analysis (sofa)
typeName - the type of the annotation
featureNames - the name of the feature from which the token text is build
featureFormats - optional map of format objects to convert feature values to strings - the key must be the feature name
Throws:
InvalidTokenSourceException

AnnotationTokenStream

public AnnotationTokenStream(org.apache.uima.jcas.JCas jCas,
                             String sofaName,
                             String typeName,
                             String featurePath,
                             List<String> featureNames,
                             Map<String,Format> featureFormats)
                      throws InvalidTokenSourceException
Creates a TokenStream which extracts all feature values of a given feature name list from annotations with a given type from a given JCas object. The addressed features are part of direct or indirect feature structure value of a annotation. For example a annotation of type person has a feature address which values are address feature structures with features for the street, postal code and city . To create tokens with postal code and city of a persons address, the featurePath must be "address" and the featureNames "postalCode" and "city". Each token has the start and end offset of the annotation and uses the concatenation of all the feature values as term text.

Parameters:
jCas - the JCas object
sofaName - the name of the Subject of Analysis (sofa)
typeName - the type of the annotation
featurePath - the path to the feature structures which features should be used for tokens Path entries should be separated by ".". Example: "affiliation.address.country"
featureNames - the name of the feature from which the token text is build
featureFormats - optional map of format objects to convert feature values to strings - the key must be the feature name
Throws:
InvalidTokenSourceException

AnnotationTokenStream

public AnnotationTokenStream(org.apache.uima.jcas.JCas jCas,
                             String sofaName,
                             String typeName,
                             String featurePath,
                             List<String> featureNames,
                             String delimiter,
                             Map<String,Format> featureFormats)
                      throws InvalidTokenSourceException
Creates a TokenStream which extracts all feature values of a given feature name list from annotations with a given type from a given JCas object. The addressed features are part of direct or indirect feature structure value of a annotation. For example a annotation of type person has a feature address which values are address feature structures with features for the street, postal code and city . To create tokens with postal code and city of a persons address, the featurePath must be "address" and the featureNames "postalCode" and "city". Each token has the start and end offset of the annotation and uses the concatenation of all the feature values as term text. Optionally the different feature values of an annotation can be concatenated with a delimiter.

Parameters:
jCas - the JCas object
sofaName - the name of the Subject of Analysis (sofa)
typeName - the type of the annotation
featurePath - the path to the feature structures which features should be used for tokens Path entries should be separated by ".". Example: "affiliation.address.country"
featureNames - the name of the feature from which the token text is build
delimiter - a delimiter for concatenating the different feature values of an annotation object. If null a white space will be used.
featureFormats - optional map of format objects to convert feature values to strings - the key must be the feature name
Throws:
InvalidTokenSourceException
Method Detail

next

public org.apache.lucene.analysis.Token next(org.apache.lucene.analysis.Token token)
                                      throws IOException
Overrides:
next in class org.apache.lucene.analysis.TokenStream
Throws:
IOException

next

public org.apache.lucene.analysis.Token next()
                                      throws IOException
Overrides:
next in class org.apache.lucene.analysis.TokenStream
Throws:
IOException

initializeIterators

protected void initializeIterators()

createFeatureStructureIterator

protected Iterator<org.apache.uima.cas.FeatureStructure> createFeatureStructureIterator(org.apache.uima.jcas.tcas.Annotation annotation,
                                                                                        String featurePath)

createFeatureValueIterator

protected Iterator<String> createFeatureValueIterator(org.apache.uima.cas.FeatureStructure srcFeatureStructure,
                                                      Collection<String> featureNames)

getValueForFeature

public String getValueForFeature(org.apache.uima.cas.FeatureStructure featureStructure,
                                 org.apache.uima.cas.Feature feature,
                                 Format format)

reset

public void reset()
Overrides:
reset in class org.apache.lucene.analysis.TokenStream

getFeatureFormats

public Map<String,Format> getFeatureFormats()

getJCas

public org.apache.uima.jcas.JCas getJCas()

getFeaturePath

public String getFeaturePath()

getFeatureNames

public List<String> getFeatureNames()

getDelimiter

public String getDelimiter()

getAnnotationType

public org.apache.uima.cas.Type getAnnotationType()


Copyright © 2006-2011 The Apache Software Foundation. All Rights Reserved.