VectorThresholdInformationGainLearner (Cognitive Foundry)

java.lang.Object
- gov.sandia.cognition.util.AbstractCloneableSerializable
- - gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner<OutputType>
  - - gov.sandia.cognition.learning.algorithm.tree.VectorThresholdInformationGainLearner<OutputType>

Type Parameters:

OutputType - The output type of the data.

All Implemented Interfaces:

BatchLearner<java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>>,VectorElementThresholdCategorizer>, DimensionFilterableLearner, DeciderLearner<Vectorizable,OutputType,java.lang.Boolean,VectorElementThresholdCategorizer>, PriorWeightedNodeLearner<OutputType>, VectorThresholdLearner<OutputType>, CloneableSerializable, java.io.Serializable, java.lang.Cloneable
```
public class VectorThresholdInformationGainLearner<OutputType>
extends AbstractVectorThresholdMaximumGainLearner<OutputType>
implements PriorWeightedNodeLearner<OutputType>
```
The VectorThresholdInformationGainLearner computes the best threshold over a dataset of vectors using information gain to determine the optimal index and threshold. This is an implementation of what is used in the C4.5 decision tree algorithm.

Information gain for a given split (sets X and Y) for two categories (a and b):
ig(X, Y) = entropy(X + Y)
– (|X| / (|X| + |Y|)) entropy(X)
– (|Y| / (|X| + |Y|)) entropy(Y)
with

entropy(Z) = - (Za / |Z|) log2(Za / |Z|) – (Zb / |Z|) log2(Zb / |Z|)

where
Za = number of a's in Z, and
Zb = number of b's in Z.
In the multi-class case, the entropy is defined as the sum over all of the categories (c) of -Zc / |Z| log2(Zc / |Z|).

Since:

2.0

Author:

Justin Basilico

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.ArrayList<OutputType>`	`categories` The categories for the prior.
`protected int[]`	`categoryCounts` The counts for each category.
`protected double[]`	`categoryPriors` The priors for each category.
`protected double[]`	`categoryProbabilities` Following is scratch space used when computing weighted entropy.

Fields inherited from class gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner
DEFAULT_MIN_SPLIT_SIZE, dimensionsToConsider, minSplitSize

Constructor Summary

Constructors
Constructor and Description
`VectorThresholdInformationGainLearner()` Creates a new instance of VectorDeciderLearner.
`VectorThresholdInformationGainLearner(int minSplitSize)` Creates a new `VectorThresholdInformationGainLearner`.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`VectorThresholdInformationGainLearner<OutputType>`	`clone()` This makes public the clone method on the `Object` class and removes the exception that it throws.
`double`	`computeSplitGain(DefaultDataDistribution<OutputType> baseCounts, DefaultDataDistribution<OutputType> positiveCounts, DefaultDataDistribution<OutputType> negativeCounts)` Computes the gain of a given split.
`void`	`configure(java.util.Map<OutputType,java.lang.Double> priors, java.util.Map<OutputType,java.lang.Integer> trainCounts)` Configure the node learner with prior weights and training counts.

Methods inherited from class gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner
computeBestGainAndThreshold, computeBestGainAndThreshold, getDimensionsToConsider, getMinSplitSize, learn, setDimensionsToConsider, setMinSplitSize

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - categories
```
protected java.util.ArrayList<OutputType> categories
```
    The categories for the prior.
  - categoryPriors
```
protected double[] categoryPriors
```
    The priors for each category.
  - categoryCounts
```
protected int[] categoryCounts
```
    The counts for each category.
  - categoryProbabilities
```
protected double[] categoryProbabilities
```
    Following is scratch space used when computing weighted entropy. It is declared here so it can be allocated once, instead of during every entropy evaluation.
- Constructor Detail
  - VectorThresholdInformationGainLearner
```
public VectorThresholdInformationGainLearner()
```
    Creates a new instance of VectorDeciderLearner.
  - VectorThresholdInformationGainLearner
```
public VectorThresholdInformationGainLearner(int minSplitSize)
```
    Creates a new VectorThresholdInformationGainLearner.
    
    Parameters:
    
    minSplitSize - The minimum split size. Must be positive.
- Method Detail
  - clone
```
public VectorThresholdInformationGainLearner<OutputType> clone()
```
    Description copied from class: AbstractCloneableSerializable
    
    This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.
    
    Specified by:
    
    clone in interface CloneableSerializable
    
    Overrides:
    
    clone in class AbstractVectorThresholdMaximumGainLearner<OutputType>
    
    Returns:
    
    A clone of this object.
  - computeSplitGain
```
public double computeSplitGain(DefaultDataDistribution<OutputType> baseCounts,
                               DefaultDataDistribution<OutputType> positiveCounts,
                               DefaultDataDistribution<OutputType> negativeCounts)
```
    Description copied from class: AbstractVectorThresholdMaximumGainLearner
    
    Computes the gain of a given split. The base counts contains the category information before the split.
    
    Specified by:
    
    computeSplitGain in class AbstractVectorThresholdMaximumGainLearner<OutputType>
    
    Parameters:
    
    baseCounts - The base category information before splitting. Contains the sum of the positive and negative counts.
    
    positiveCounts - The category information on the positive side of the split.
    
    negativeCounts - The category information on the negative side of the split.
    
    Returns:
    
    The gain of the given split computed by comparing the positive and negative counts to the base counts.
  - configure
```
public void configure(java.util.Map<OutputType,java.lang.Double> priors,
                      java.util.Map<OutputType,java.lang.Integer> trainCounts)
```
    Description copied from interface: PriorWeightedNodeLearner
    
    Configure the node learner with prior weights and training counts.
    
    If the prior weights are not specified, this method will configure default priors that match the relative frequencies of the different categories in the training data. The frequencies are based on the given category counts from the training data.
    
    Specified by:
    
    configure in interface PriorWeightedNodeLearner<OutputType>
    
    Parameters:
    
    priors - Prior weights for each of the possible output values (i.e., the categories for the prediction task). If null, the method will estimate default priors from the training counts.
    
    trainCounts - Frequency counts of the possible output values (i.e., categories) in the training data. This parameter should always be specified.

Class VectorThresholdInformationGainLearner<OutputType>

Field Summary

Fields inherited from class gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner

Constructor Summary

Method Summary

Methods inherited from class gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner

Methods inherited from class java.lang.Object

Field Detail

categories

categoryPriors

categoryCounts

categoryProbabilities

Constructor Detail

VectorThresholdInformationGainLearner

VectorThresholdInformationGainLearner

Method Detail

clone

computeSplitGain

configure