AbstractVectorThresholdMaximumGainLearner (Cognitive Foundry)

java.lang.Object
- gov.sandia.cognition.util.AbstractCloneableSerializable
- - gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner<OutputType>

Type Parameters:

OutputType - The output category type for the training data.

All Implemented Interfaces:

BatchLearner<java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>>,VectorElementThresholdCategorizer>, DimensionFilterableLearner, DeciderLearner<Vectorizable,OutputType,java.lang.Boolean,VectorElementThresholdCategorizer>, VectorThresholdLearner<OutputType>, CloneableSerializable, java.io.Serializable, java.lang.Cloneable

Direct Known Subclasses:

VectorThresholdGiniImpurityLearner, VectorThresholdHellingerDistanceLearner, VectorThresholdInformationGainLearner
```
public abstract class AbstractVectorThresholdMaximumGainLearner<OutputType>
extends AbstractCloneableSerializable
implements VectorThresholdLearner<OutputType>
```
An abstract class for decider learners that produce a threshold function on a vector element based on maximizing some gain value. It handles the looping over the elements of the vector and then for each element looping over the possible split points. Subclasses only need to define a method to compute the gain of a given split.

Since:

3.0

Author:

Justin Basilico

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`static int`	`DEFAULT_MIN_SPLIT_SIZE` The default value for the minimum split size is 1.
`protected int[]`	`dimensionsToConsider` The array of dimensions for the learner to consider.
`protected int`	`minSplitSize` The threshold for allowing a split to be made, determined by how many instances fall in each left or right sides of the split.

Constructor Summary

Constructors
Constructor and Description
`AbstractVectorThresholdMaximumGainLearner()` Creates a new `AbstractVectorThresholdMaximumGainLearner`.
`AbstractVectorThresholdMaximumGainLearner(int minSplitSize, int[] dimensionsToConsider)` Creates a new `AbstractVectorThresholdMaximumGainLearner`.

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`AbstractVectorThresholdMaximumGainLearner<OutputType>`	`clone()` This makes public the clone method on the `Object` class and removes the exception that it throws.
`DefaultPair<java.lang.Double,java.lang.Double>`	`computeBestGainAndThreshold(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data, int dimension, DefaultDataDistribution<OutputType> baseCounts)` Computes the best gain and threshold for a given dimension using the computeSplitGain method for each potential split point of values for the given dimension.
`protected DefaultPair<java.lang.Double,java.lang.Double>`	`computeBestGainAndThreshold(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data, int dimension, DefaultDataDistribution<OutputType> baseCounts, java.util.ArrayList<DefaultWeightedValue<OutputType>> values)` Computes the best gain and threshold for a given dimension using the computeSplitGain method for each potential split point of values for the given dimension.
`abstract double`	`computeSplitGain(DefaultDataDistribution<OutputType> baseCounts, DefaultDataDistribution<OutputType> positiveCounts, DefaultDataDistribution<OutputType> negativeCounts)` Computes the gain of a given split.
`int[]`	`getDimensionsToConsider()` Gets the dimensions that the learner is to consider.
`int`	`getMinSplitSize()` Gets the minimum split size.
`VectorElementThresholdCategorizer`	`learn(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data)` The `learn` method creates an object of `ResultType` using data of type `DataType`, using some form of "learning" algorithm.
`void`	`setDimensionsToConsider(int... dimensionsToConsider)` Gets the dimensions that the learner is to consider.
`void`	`setMinSplitSize(int minSplitSize)` Sets the minimum split size.

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_MIN_SPLIT_SIZE
```
public static final int DEFAULT_MIN_SPLIT_SIZE
```
    The default value for the minimum split size is 1.
    
    See Also:
    
    Constant Field Values
  - minSplitSize
```
protected int minSplitSize
```
    The threshold for allowing a split to be made, determined by how many instances fall in each left or right sides of the split. Both sides must have at least this number of instances. Must be positive.
  - dimensionsToConsider
```
protected int[] dimensionsToConsider
```
    The array of dimensions for the learner to consider. If this is null, then all dimensions are considered.
- Constructor Detail
  - AbstractVectorThresholdMaximumGainLearner
```
public AbstractVectorThresholdMaximumGainLearner()
```
    Creates a new AbstractVectorThresholdMaximumGainLearner.
  - AbstractVectorThresholdMaximumGainLearner
```
public AbstractVectorThresholdMaximumGainLearner(int minSplitSize,
                                                 int[] dimensionsToConsider)
```
    Creates a new AbstractVectorThresholdMaximumGainLearner.
    
    Parameters:
    
    minSplitSize - The minimum split size. Must be positive.
    
    dimensionsToConsider - The array of vector dimensions to consider. Null means all of them are considered.
- Method Detail
  - clone
```
public AbstractVectorThresholdMaximumGainLearner<OutputType> clone()
```
    Description copied from class: AbstractCloneableSerializable
    
    This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.
    
    Specified by:
    
    clone in interface CloneableSerializable
    
    Overrides:
    
    clone in class AbstractCloneableSerializable
    
    Returns:
    
    A clone of this object.
  - learn
```
public VectorElementThresholdCategorizer learn(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data)
```
    Description copied from interface: BatchLearner
    
    The learn method creates an object of ResultType using data of type DataType, using some form of "learning" algorithm.
    
    Specified by:
    
    learn in interface BatchLearner<java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>>,VectorElementThresholdCategorizer>
    
    Parameters:
    
    data - The data that the learning algorithm will use to create an object of ResultType.
    
    Returns:
    
    The object that is created based on the given data using the learning algorithm.
  - computeBestGainAndThreshold
```
public DefaultPair<java.lang.Double,java.lang.Double> computeBestGainAndThreshold(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data,
                                                                                  int dimension,
                                                                                  DefaultDataDistribution<OutputType> baseCounts)
```
    Computes the best gain and threshold for a given dimension using the computeSplitGain method for each potential split point of values for the given dimension.
    
    Parameters:
    
    data - The data to use to compute the threshold.
    
    dimension - The dimension to compute the threshold for.
    
    baseCounts - Information about the base category counts.
    
    Returns:
    
    A pair containing the best gain computed and its associated threshold. If there is no good split point, null is returned. This can happen if there is no data or every value is the same.
  - computeBestGainAndThreshold
```
protected DefaultPair<java.lang.Double,java.lang.Double> computeBestGainAndThreshold(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data,
                                                                                     int dimension,
                                                                                     DefaultDataDistribution<OutputType> baseCounts,
                                                                                     java.util.ArrayList<DefaultWeightedValue<OutputType>> values)
```
    Computes the best gain and threshold for a given dimension using the computeSplitGain method for each potential split point of values for the given dimension.
    
    Parameters:
    
    data - The data to use to compute the threshold.
    
    dimension - The dimension to compute the threshold for.
    
    baseCounts - Information about the base category counts.
    
    values - A workspace to store the values of the data in. Recycled to avoid recreating a large array each time.
    
    Returns:
    
    A pair containing the best gain computed and its associated threshold. If there is no good split point, null is returned. This can happen if there is no data or every value is the same.
  - computeSplitGain
```
public abstract double computeSplitGain(DefaultDataDistribution<OutputType> baseCounts,
                                        DefaultDataDistribution<OutputType> positiveCounts,
                                        DefaultDataDistribution<OutputType> negativeCounts)
```
    Computes the gain of a given split. The base counts contains the category information before the split.
    
    Parameters:
    
    baseCounts - The base category information before splitting. Contains the sum of the positive and negative counts.
    
    positiveCounts - The category information on the positive side of the split.
    
    negativeCounts - The category information on the negative side of the split.
    
    Returns:
    
    The gain of the given split computed by comparing the positive and negative counts to the base counts.
  - getDimensionsToConsider
```
public int[] getDimensionsToConsider()
```
    Description copied from interface: DimensionFilterableLearner
    
    Gets the dimensions that the learner is to consider. Null means that all of them are included.
    
    Specified by:
    
    getDimensionsToConsider in interface DimensionFilterableLearner
    
    Returns:
    
    The array of vector dimensions to consider. Null means all of them are considered.
  - setDimensionsToConsider
```
public void setDimensionsToConsider(int... dimensionsToConsider)
```
    Description copied from interface: DimensionFilterableLearner
    
    Gets the dimensions that the learner is to consider. Null means that all of them are included.
    
    Specified by:
    
    setDimensionsToConsider in interface DimensionFilterableLearner
    
    Parameters:
    
    dimensionsToConsider - The array of vector dimensions to consider. Null means all of them are considered.
  - getMinSplitSize
```
public int getMinSplitSize()
```
    Gets the minimum split size. This is the minimum number of examples that can fall on either side of the split for it to be valid. If there is not at least twice this number of examples in the input data, then no split is returned.
    
    Returns:
    
    The minimum split size. Must be positive.
  - setMinSplitSize
```
public void setMinSplitSize(int minSplitSize)
```
    Sets the minimum split size. This is the minimum number of examples that can fall on either side of the split for it to be valid. If there is not at least twice this number of examples in the input data, then no split is returned.
    
    Parameters:
    
    minSplitSize - The minimum split size. Must be positive.

Class AbstractVectorThresholdMaximumGainLearner<OutputType>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_MIN_SPLIT_SIZE

minSplitSize

dimensionsToConsider

Constructor Detail

AbstractVectorThresholdMaximumGainLearner

AbstractVectorThresholdMaximumGainLearner

Method Detail

clone

learn

computeBestGainAndThreshold

computeBestGainAndThreshold

computeSplitGain

getDimensionsToConsider

setDimensionsToConsider

getMinSplitSize

setMinSplitSize