public abstract class AbstractFrequencyBasedGlobalTermWeighter extends AbstractGlobalTermWeighter
GlobalTermWeighter
that keeps track of term frequencies
in documents. For each term, it keeps track of both the document frequency
(the number of documents the term appears in) and the global frequency
(the total number of times the term appears). It also keeps track of the
total number of documents.Modifier and Type | Field and Description |
---|---|
protected int |
documentCount
The number of documents the weight is computed over.
|
protected Vector |
termDocumentFrequencies
The vector containing the number of documents that each term occurs in.
|
protected Vector |
termGlobalFrequencies
A vector containing the total number of times that each term occurred
in the document set.
|
vectorFactory
Constructor and Description |
---|
AbstractFrequencyBasedGlobalTermWeighter()
Creates a new
AbstractCountingBasedGlobalTermWeighter . |
AbstractFrequencyBasedGlobalTermWeighter(VectorFactory<? extends Vector> vectorFactory)
Creates a new
AbstractCountingBasedGlobalTermWeighter . |
Modifier and Type | Method and Description |
---|---|
void |
add(Vector counts)
Adds a document to the model.
|
AbstractFrequencyBasedGlobalTermWeighter |
clone()
This makes public the clone method on the
Object class and
removes the exception that it throws. |
int |
getDocumentCount()
Gets the number of documents that this object is using for its model
|
Vector |
getTermDocumentFrequencies()
Gets the vector containing the number of documents that each term
appears in.
|
Vector |
getTermGlobalFrequencies()
Gets the vector containing the number of times that each term appears.
|
protected void |
growVectors(int newDimensionality)
Called when the dimensionality of the term vector grows.
|
protected void |
initializeVectors(int dimensionality)
Initializes internal vectors to the given dimensionality.
|
boolean |
remove(Vector counts)
Removes the document from the model.
|
protected void |
setDocumentCount(int documentCount)
Sets the document count.
|
protected void |
setTermDocumentFrequencies(Vector termDocumentFrequencies)
Sets the vector containing the number of documents that each term
appears in.
|
protected void |
setTermGlobalFrequencies(Vector termGlobalFrequencies)
Gets the vector containing the number of times that each term appears.
|
getVectorFactory, setVectorFactory
add, addAll, remove, removeAll
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getDimensionality, getGlobalWeights
add, addAll, remove, removeAll
protected int documentCount
protected Vector termDocumentFrequencies
protected Vector termGlobalFrequencies
public AbstractFrequencyBasedGlobalTermWeighter()
AbstractCountingBasedGlobalTermWeighter
.public AbstractFrequencyBasedGlobalTermWeighter(VectorFactory<? extends Vector> vectorFactory)
AbstractCountingBasedGlobalTermWeighter
.vectorFactory
- The vector factory to use.public AbstractFrequencyBasedGlobalTermWeighter clone()
AbstractCloneableSerializable
Object
class and
removes the exception that it throws. Its default behavior is to
automatically create a clone of the exact type of object that the
clone is called on and to copy all primitives but to keep all references,
which means it is a shallow copy.
Extensions of this class may want to override this method (but call
super.clone()
to implement a "smart copy". That is, to target
the most common use case for creating a copy of the object. Because of
the default behavior being a shallow copy, extending classes only need
to handle fields that need to have a deeper copy (or those that need to
be reset). Some of the methods in ObjectUtil
may be helpful in
implementing a custom clone method.
Note: The contract of this method is that you must use
super.clone()
as the basis for your implementation.clone
in interface CloneableSerializable
clone
in class AbstractCloneableSerializable
public void add(Vector counts)
VectorSpaceModel
counts
- Adds a document to the model.public boolean remove(Vector counts)
VectorSpaceModel
counts
- The document to remove.protected void initializeVectors(int dimensionality)
dimensionality
- The dimensionality to initialize to.protected void growVectors(int newDimensionality)
newDimensionality
- The new dimensionality;public int getDocumentCount()
VectorSpaceModel
protected void setDocumentCount(int documentCount)
documentCount
- The document count.public Vector getTermDocumentFrequencies()
protected void setTermDocumentFrequencies(Vector termDocumentFrequencies)
termDocumentFrequencies
- The document frequencies.public Vector getTermGlobalFrequencies()
protected void setTermGlobalFrequencies(Vector termGlobalFrequencies)
termGlobalFrequencies
- The term global frequencies.