Package | Description |
---|---|
gov.sandia.cognition.text.term |
Provides term representing text content in documents.
|
gov.sandia.cognition.text.term.filter |
Provides classes for filtering and transforming terms.
|
gov.sandia.cognition.text.term.vector |
Provides methods for handling documents represented as term vectors.
|
gov.sandia.cognition.text.token |
Provides text tokenization algorithms.
|
Modifier and Type | Interface and Description |
---|---|
interface |
IndexedTerm
Interface for a term plus its index.
|
interface |
Term
Interface for a term, which is a basic unit of data in information retrieval.
|
interface |
TermNGram
Interface for a term that is some type of n-gram.
|
interface |
TermOccurrence
Interface for an occurrence of a term in some text.
|
Modifier and Type | Class and Description |
---|---|
class |
AbstractTerm
Creates a new
AbstractTerm . |
class |
DefaultIndexedTerm
Default implementation of the
IndexedTerm interface. |
class |
DefaultTerm
A default implementation of the
Term interface. |
class |
DefaultTermNGram
A default implementation of the
TermNGram interface. |
class |
DefaultTermOccurrence
A default implementation of the
TermOccurrence interface. |
Modifier and Type | Method and Description |
---|---|
IndexedTerm |
AbstractTermIndex.add(Termable termable) |
void |
DefaultTermCounts.add(Termable term)
Increments the count for a given term.
|
IndexedTerm |
TermIndex.add(Termable termable)
Adds the given term to the index.
|
int |
AbstractTermIndex.getIndex(Termable termable) |
int |
TermIndex.getIndex(Termable term)
Gets the index of the given term.
|
IndexedTerm |
AbstractTermIndex.getIndexedTerm(Termable termable) |
IndexedTerm |
TermIndex.getIndexedTerm(Termable term)
Gets the index-term pair for the given term, if it is in the index.
|
boolean |
AbstractTermIndex.hasTerm(Termable termable) |
boolean |
TermIndex.hasTerm(Termable term)
Determines if the index contains the given term.
|
Modifier and Type | Method and Description |
---|---|
void |
AbstractTermIndex.addAll(java.lang.Iterable<? extends Termable> terms) |
void |
DefaultTermCounts.addAll(java.lang.Iterable<? extends Termable> terms)
Adds all of the given terms to the counters; one for each term
occurrence.
|
void |
TermIndex.addAll(java.lang.Iterable<? extends Termable> terms)
Adds all of the given terms to the index, if they are not already part
of it.
|
Modifier and Type | Method and Description |
---|---|
boolean |
DefaultStopList.contains(Termable term) |
boolean |
StopList.contains(Termable term)
Determines if the given term is contained in this stop list.
|
Modifier and Type | Method and Description |
---|---|
Vector |
BagOfWordsTransform.convertToVector(java.lang.Iterable<? extends Termable> terms)
Converts a given list of terms to a vector by counting the occurrence of
each term.
|
static Vector |
BagOfWordsTransform.convertToVector(java.lang.Iterable<? extends Termable> terms,
TermIndex termIndex,
VectorFactory<?> vectorFactory)
Converts a given list of terms to a vector by counting the occurrence of
each term.
|
Vector |
BagOfWordsTransform.convertToVector(java.lang.Iterable<? extends Termable> terms,
VectorFactory<?> vectorFactory)
Converts a given list of terms to a vector by counting the occurrence of
each term.
|
Vector |
BagOfWordsTransform.evaluate(java.lang.Iterable<? extends Termable> terms) |
Modifier and Type | Interface and Description |
---|---|
interface |
Token
Interface for a meaningful chunk of text, called a token.
|
Modifier and Type | Class and Description |
---|---|
class |
DefaultToken
A default implementation of the
Token interface. |