public class ValenceSpreader<TermType extends java.lang.Comparable<TermType>,DocIdType extends java.lang.Comparable<DocIdType>>
extends java.lang.Object
Modifier and Type | Class and Description |
---|---|
static class |
ValenceSpreader.Result<TermType,DocIdType>
The return type from running the spreadValence methods.
|
Constructor and Description |
---|
ValenceSpreader()
Creates an empty valence spreader.
|
Modifier and Type | Method and Description |
---|---|
void |
addDocumentTermOccurrences(DocIdType documentId,
java.util.Set<TermType> terms)
Adds the input document with all of the input terms in the data.
|
void |
addDocumentTermWeights(DocIdType documentId,
java.util.Map<TermType,java.lang.Double> terms)
Adds the input document with all of the input terms with their input
scores (should be greater than 0) to the data.
|
void |
addWeightedDocument(DocIdType documentId,
double score)
Adds the input documentId with its associated score.
|
void |
addWeightedDocument(DocIdType documentId,
double score,
double trust)
Adds the input documentId with its associated score/trust.
|
void |
addWeightedTerm(TermType term,
double score)
Adds the input term with its associated score.
|
void |
addWeightedTerm(TermType term,
double score,
double trust)
Adds the input term with its associated score and trust level.
|
void |
centerWeightsRange()
This algorithm only works when there are some negative scores and some
positive scores.
|
void |
setIterativeSolverTolerance(double tolerance)
The tolerance that between-iteration error must be below before
considering the iterative solver "done".
|
void |
setNumThreads(int numThreads)
Specifies how many threads to use in the matrix/vector multiplies in the
iterative solver.
|
ValenceSpreader.Result<TermType,DocIdType> |
spreadValence()
This method solves the system of equations to determine the valence for
all documents input and for all terms in those documents.
|
ValenceSpreader.Result<TermType,DocIdType> |
spreadValence(int power)
This method solves the system of equations to determine the valence for
all documents input and for all terms in those documents.
|
public ValenceSpreader()
public void setNumThreads(int numThreads)
numThreads
- The number of threads to usepublic void setIterativeSolverTolerance(double tolerance)
tolerance
- The error must go below this before the solver completespublic void addWeightedTerm(TermType term, double score)
term
- The term with the associated scorescore
- The score for the input termpublic void addWeightedTerm(TermType term, double score, double trust)
term
- The term with its associated valuesscore
- The score for the input termtrust
- The amount to trust the input score. Should be greater than
0. The importance here is how this score ranks relative to the other
scores input.public void addWeightedDocument(DocIdType documentId, double score)
documentId
- The document id that refers to a document added via one
of the addDocumentTerm* methods.score
- The score for the input documentpublic void addWeightedDocument(DocIdType documentId, double score, double trust)
documentId
- The document id that refers to a document added via one
of the addDocumentTerm* methods.score
- The score for the input documenttrust
- The amount to trust the input score (should be greater than
0). This only matters in relation to other trust scores -- higher scores
are trusted more.public void addDocumentTermOccurrences(DocIdType documentId, java.util.Set<TermType> terms)
documentId
- The unique ID for this document. If the same id is used
more than once, the earlier data will be replaced with the new data.terms
- The set of terms that occur in the documentpublic void addDocumentTermWeights(DocIdType documentId, java.util.Map<TermType,java.lang.Double> terms)
documentId
- The unique ID for this document. If the same id is used
more than once, the earlier data will be replaced with the new data.terms
- The set of terms and their associated scores from this
document (score can be TF, TF-IDF, etc.)public void centerWeightsRange()
public ValenceSpreader.Result<TermType,DocIdType> spreadValence()
public ValenceSpreader.Result<TermType,DocIdType> spreadValence(int power)
power
- This correlates with how far to spread the influence of the
scored values. A power of 0 (not permitted) won't spread at all. A power
of 1 will only spread scores from a document to their terms or from terms
to their documents. It correlates with the distance of the spread, but
does not match it perfectly. In our experience, 10 has been a rather good
number for this parameter.