gov.sandia.cognition.statistics

## Class ChiSquaredSimilarity

• java.lang.Object
• gov.sandia.cognition.statistics.ChiSquaredSimilarity

• ```@PublicationReference(author="Yao-Tsung Chen, Meng Chang Chen",
title="Using chi-square statistics to measure similarities for text categorization",
type=Journal,
year=2011,
url="http://www.sciencedirect.com/science/article/pii/S0957417410008961#")
public class ChiSquaredSimilarity
extends java.lang.Object```
A class for computing the chi-squared similarity between two vectors. A chi- squared test requires frequency vectors, typically representing documents, so all values in the vectors will be computed as non-negative values. The test assumes one vector represents a document in a given category, and another vector which is being tested to see if it is (likely) from the same distribution as the original vector. Note that the test is symmetric, so the choice of which vector is the categorized vector and which one is the testing vector is somewhat arbitrary.
Since:
3.4.2
Author:
trbroun
• ### Constructor Summary

Constructors
Constructor and Description
```ChiSquaredSimilarity(Vector categorizedVector, Vector testingVector)```
Basic constructor.
• ### Method Summary

All Methods
Modifier and Type Method and Description
`double` `compute()`
Computes the chi-squared statistic of the two vectors.
`double` `computeCumulativeProbabilityValue()`
Computes the chi-squared similarity statistic, then uses that to compute a cumulative probability.
`Vector` `getCategorizedVector()`
Basic getter for the categorized vector.
`Vector` `getTestVector()`
Basic getter for the testing vector.
`void` `setCategorizedVector(Vector newCategorizedVector)`
Basic setter for the categorized vector.
`void` `setTestVector(Vector newTestVector)`
Basic setter for the test vector.
• ### Methods inherited from class java.lang.Object

`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`
• ### Constructor Detail

• #### ChiSquaredSimilarity

```public ChiSquaredSimilarity(Vector categorizedVector,
Vector testingVector)```
Basic constructor. Sets the variables in the intuitive manner.
Parameters:
`categorizedVector` - The vector from a known category.
`testingVector` - The vector which is being tested to see if it comes from the same category.
• ### Method Detail

• #### setCategorizedVector

`public void setCategorizedVector(Vector newCategorizedVector)`
Basic setter for the categorized vector.
Parameters:
`newCategorizedVector` -
• #### setTestVector

`public void setTestVector(Vector newTestVector)`
Basic setter for the test vector.
Parameters:
`newTestVector` -
• #### getCategorizedVector

`public Vector getCategorizedVector()`
Basic getter for the categorized vector.
Returns:
The categorized vector.
• #### getTestVector

`public Vector getTestVector()`
Basic getter for the testing vector.
Returns:
The test vector.
• #### compute

`public double compute()`
Computes the chi-squared statistic of the two vectors. This is a raw number and needs to be fed into a chi-squared distribution to compute a probability. Both vectors must be non-zero.
Returns:
The chi-squared statistic.
• #### computeCumulativeProbabilityValue

`public double computeCumulativeProbabilityValue()`
Computes the chi-squared similarity statistic, then uses that to compute a cumulative probability. Returns the probability that a chi-squared statistic falls between 0 and the critical value (the computed chi-squared statistic for the two supplied vectors). Naturally, a large chi-squared value generates a large cumulative probability value.
Returns:
The probability of a chi-squared statistic being lower than the value of the chi-squared similarity of the given vectors.