ValenceSpreader (Cognitive Foundry)

java.lang.Object
- gov.sandia.cognition.text.algorithm.ValenceSpreader<TermType,DocIdType>

```
public class ValenceSpreader<TermType extends java.lang.Comparable<TermType>,DocIdType extends java.lang.Comparable<DocIdType>>
extends java.lang.Object
```
This class serves as a wrapper for the MultipartiteValenceMatrix class to simplify the interface for the most common valence task: Ranking a set of documents based on a small set of scored documents and/or a set of scored terms. This algorithm only works when there are some negative scores and some positive scores. However, some datasets (such as ANEW) score from [0 ... 10] or similar. If your labels are like ANEW (with non-balanced scores on a positive/negative scale), you can call centerWeightsRange to make sure there are some negative and some positive scores. Note that this class also serves as an example of how to call MultipartiteValenceMatrix for if you have a different application and just want to see how it's done.

Author:

jdwendt

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class ValenceSpreader.Result<TermType,DocIdType>
The return type from running the spreadValence methods.

Nested Classes
Modifier and Type	Class and Description
`static class`	`ValenceSpreader.Result<TermType,DocIdType>` The return type from running the spreadValence methods.

Constructor Summary

Constructors
Constructor and Description

ValenceSpreader()
Creates an empty valence spreader.

Constructors
Constructor and Description
`ValenceSpreader()` Creates an empty valence spreader.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`addDocumentTermOccurrences(DocIdType documentId, java.util.Set<TermType> terms)` Adds the input document with all of the input terms in the data.
`void`	`addDocumentTermWeights(DocIdType documentId, java.util.Map<TermType,java.lang.Double> terms)` Adds the input document with all of the input terms with their input scores (should be greater than 0) to the data.
`void`	`addWeightedDocument(DocIdType documentId, double score)` Adds the input documentId with its associated score.
`void`	`addWeightedDocument(DocIdType documentId, double score, double trust)` Adds the input documentId with its associated score/trust.
`void`	`addWeightedTerm(TermType term, double score)` Adds the input term with its associated score.
`void`	`addWeightedTerm(TermType term, double score, double trust)` Adds the input term with its associated score and trust level.
`void`	`centerWeightsRange()` This algorithm only works when there are some negative scores and some positive scores.
`void`	`setIterativeSolverTolerance(double tolerance)` The tolerance that between-iteration error must be below before considering the iterative solver "done".
`void`	`setNumThreads(int numThreads)` Specifies how many threads to use in the matrix/vector multiplies in the iterative solver.
`ValenceSpreader.Result<TermType,DocIdType>`	`spreadValence()` This method solves the system of equations to determine the valence for all documents input and for all terms in those documents.
`ValenceSpreader.Result<TermType,DocIdType>`	`spreadValence(int power)` This method solves the system of equations to determine the valence for all documents input and for all terms in those documents.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ValenceSpreader
```
public ValenceSpreader()
```
    Creates an empty valence spreader. After initialization, documents and some set of scores must be passed in.
- Method Detail
  - setNumThreads
```
public void setNumThreads(int numThreads)
```
    Specifies how many threads to use in the matrix/vector multiplies in the iterative solver. Note that more threads is not necessarily better. On many small tests (<100 documents) a single thread has been best. We've run up to several million entries in the matrix (including documents and terms) with only 10-ish threads. Note that you don't need to call this method before solving as it's initialized to a reasonable number of threads (2).
    
    Parameters:
    
    numThreads - The number of threads to use
  - setIterativeSolverTolerance
```
public void setIterativeSolverTolerance(double tolerance)
```
    The tolerance that between-iteration error must be below before considering the iterative solver "done". This essentially maps to the L-2 error of the result and inversely correlates with how long it takes for the solver to complete. We initialize this to 1e-5, but you can alter that.
    
    Parameters:
    
    tolerance - The error must go below this before the solver completes
  - addWeightedTerm
```
public void addWeightedTerm(TermType term,
                            double score)
```
    Adds the input term with its associated score. Note that this term/score pair will only be used when solving for the system if some document uses that term at least once.
    
    Parameters:
    
    term - The term with the associated score
    
    score - The score for the input term
  - addWeightedTerm
```
public void addWeightedTerm(TermType term,
                            double score,
                            double trust)
```
    Adds the input term with its associated score and trust level. Note that this term/score/trust tuple will only be used when solving for if some document uses that term at least once.
    
    Parameters:
    
    term - The term with its associated values
    
    score - The score for the input term
    
    trust - The amount to trust the input score. Should be greater than 0. The importance here is how this score ranks relative to the other scores input.
  - addWeightedDocument
```
public void addWeightedDocument(DocIdType documentId,
                                double score)
```
    Adds the input documentId with its associated score. Note that this documentId/score will only be used when solving if a document was added with this ID.
    
    Parameters:
    
    documentId - The document id that refers to a document added via one of the addDocumentTerm* methods.
    
    score - The score for the input document
  - addWeightedDocument
```
public void addWeightedDocument(DocIdType documentId,
                                double score,
                                double trust)
```
    Adds the input documentId with its associated score/trust. Note that this will only be used when solving if a document was added with the input ID.
    
    Parameters:
    
    documentId - The document id that refers to a document added via one of the addDocumentTerm* methods.
    
    score - The score for the input document
    
    trust - The amount to trust the input score (should be greater than 0). This only matters in relation to other trust scores -- higher scores are trusted more.
  - addDocumentTermOccurrences
```
public void addDocumentTermOccurrences(DocIdType documentId,
                                       java.util.Set<TermType> terms)
```
    Adds the input document with all of the input terms in the data. Note that this method and addDocumentTermWeights should be mutually exclusive methods: It doesn't make sense to add one document via this method and another via the other.
    
    Parameters:
    
    documentId - The unique ID for this document. If the same id is used more than once, the earlier data will be replaced with the new data.
    
    terms - The set of terms that occur in the document
  - addDocumentTermWeights
```
public void addDocumentTermWeights(DocIdType documentId,
                                   java.util.Map<TermType,java.lang.Double> terms)
```
    Adds the input document with all of the input terms with their input scores (should be greater than 0) to the data. Note that this method and addDocumentTermOccurrences should be mutually exclusive methods: It doesn't make sense to add one document via this method and another via the other.
    
    Parameters:
    
    documentId - The unique ID for this document. If the same id is used more than once, the earlier data will be replaced with the new data.
    
    terms - The set of terms and their associated scores from this document (score can be TF, TF-IDF, etc.)
  - centerWeightsRange
```
public void centerWeightsRange()
```
    This algorithm only works when there are some negative scores and some positive scores. However, some datasets (such as ANEW) score from [0 ... 10] or similar. This recenters both the term scores and document scores to go from -1 to 1. Note that the two sets of scores are centered independently, so if you want to have only positive term scores and only negative document scores, don't call this method.
  - spreadValence
```
public ValenceSpreader.Result<TermType,DocIdType> spreadValence()
```
    This method solves the system of equations to determine the valence for all documents input and for all terms in those documents. Before callig this method, you should call an addDocumentTerm* method multiple times for all of the documents and call addWeighted* with some positive and negative values passed in. Optionally (if your positive and negative values are all numerically positive) you should call centerWeightsRange also before calling this method. This version uses the default power of 10. This has generally worked well in previous experiments.
    
    Returns:
    
    The results of spreading the valence -- The term weights can be used in the future as a classifier; the document weights can be used independently to identify which documents are most extreme on either end.
  - spreadValence
```
public ValenceSpreader.Result<TermType,DocIdType> spreadValence(int power)
```
    This method solves the system of equations to determine the valence for all documents input and for all terms in those documents. Before callig this method, you should call an addDocumentTerm* method multiple times for all of the documents and call addWeighted* with some positive and negative values passed in. Optionally (if your positive and negative values are all numerically positive) you should call centerWeightsRange also before calling this method.
    
    Parameters:
    
    power - This correlates with how far to spread the influence of the scored values. A power of 0 (not permitted) won't spread at all. A power of 1 will only spread scores from a document to their terms or from terms to their documents. It correlates with the distance of the spread, but does not match it perfectly. In our experience, 10 has been a rather good number for this parameter.
    
    Returns:
    
    The results of spreading the valence -- The term weights can be used in the future as a classifier; the document weights can be used independently to identify which documents are most extreme on either end.

Class ValenceSpreader<TermType extends java.lang.Comparable<TermType>,DocIdType extends java.lang.Comparable<DocIdType>>

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

ValenceSpreader

Method Detail

setNumThreads

setIterativeSolverTolerance

addWeightedTerm

addWeightedTerm

addWeightedDocument

addWeightedDocument

addDocumentTermOccurrences

addDocumentTermWeights

centerWeightsRange

spreadValence

spreadValence