VectorThresholdGiniImpurityLearner (Cognitive Foundry)

java.lang.Object
- gov.sandia.cognition.util.AbstractCloneableSerializable
- - gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner<OutputType>
  - - gov.sandia.cognition.learning.algorithm.tree.VectorThresholdGiniImpurityLearner<OutputType>

Type Parameters:

OutputType - The type of the output categories to learn over.

All Implemented Interfaces:

BatchLearner<java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>>,VectorElementThresholdCategorizer>, DimensionFilterableLearner, DeciderLearner<Vectorizable,OutputType,java.lang.Boolean,VectorElementThresholdCategorizer>, VectorThresholdLearner<OutputType>, CloneableSerializable, java.io.Serializable, java.lang.Cloneable
```
@PublicationReference(author="Wikipedia",
                      title="Decision tree learning",
                      year=2010,
                      type=WebPage,
                      url="http://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity")
public class VectorThresholdGiniImpurityLearner<OutputType>
extends AbstractVectorThresholdMaximumGainLearner<OutputType>
```
Learns vector thresholds based on the Gini impurity measure. It attempts to minimize the Gini impurity in splits. If f_i is the fraction of examples belonging to category i in split f, then the Gini impurity measure is defined as:
sum_i f_i * (1 - f_i)
Notice that sum_i f_i = 1, so the value will range between 0 and 1.

This measure is the one used in the Classification and Regression Tree (CART) algorithm.

Since:

3.0

Author:

Justin Basilico

See Also:

Serialized Form

Field Summary
- Fields inherited from class gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner
  DEFAULT_MIN_SPLIT_SIZE, dimensionsToConsider, minSplitSize

Constructor Summary

Constructors
Constructor and Description
`VectorThresholdGiniImpurityLearner()` Creates a new instance of VectorThresholdGiniImpurityLearner.
`VectorThresholdGiniImpurityLearner(int minSplitSize)` Creates a new `VectorThresholdGiniImpurityLearner`.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`VectorThresholdGiniImpurityLearner<OutputType>`	`clone()` This makes public the clone method on the `Object` class and removes the exception that it throws.
`double`	`computeSplitGain(DefaultDataDistribution<OutputType> baseCounts, DefaultDataDistribution<OutputType> positiveCounts, DefaultDataDistribution<OutputType> negativeCounts)` Computes the split gain by computing the Gini impurity for the given split.
`static <DataType> double`	`giniImpurity(DefaultDataDistribution<DataType> counts)` Computes the Gini impurity of a histogram.

Methods inherited from class gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner
computeBestGainAndThreshold, computeBestGainAndThreshold, getDimensionsToConsider, getMinSplitSize, learn, setDimensionsToConsider, setMinSplitSize

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - VectorThresholdGiniImpurityLearner
```
public VectorThresholdGiniImpurityLearner()
```
    Creates a new instance of VectorThresholdGiniImpurityLearner.
  - VectorThresholdGiniImpurityLearner
```
public VectorThresholdGiniImpurityLearner(int minSplitSize)
```
    Creates a new VectorThresholdGiniImpurityLearner.
    
    Parameters:
    
    minSplitSize - The minimum split size. Must be positive.
- Method Detail
  - clone
```
public VectorThresholdGiniImpurityLearner<OutputType> clone()
```
    Description copied from class: AbstractCloneableSerializable
    
    This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.
    
    Specified by:
    
    clone in interface CloneableSerializable
    
    Overrides:
    
    clone in class AbstractVectorThresholdMaximumGainLearner<OutputType>
    
    Returns:
    
    A clone of this object.
  - computeSplitGain
```
public double computeSplitGain(DefaultDataDistribution<OutputType> baseCounts,
                               DefaultDataDistribution<OutputType> positiveCounts,
                               DefaultDataDistribution<OutputType> negativeCounts)
```
    Computes the split gain by computing the Gini impurity for the given split.
    
    Specified by:
    
    computeSplitGain in class AbstractVectorThresholdMaximumGainLearner<OutputType>
    
    Parameters:
    
    baseCounts - The histogram of counts before the split.
    
    positiveCounts - The counts on the positive side of the threshold.
    
    negativeCounts - The counts on the negative side of the threshold.
    
    Returns:
    
    The split gain by computing the gain in Gini impurity for the given split. Will be between 0.0 and 1.0.
  - giniImpurity
```
public static <DataType> double giniImpurity(DefaultDataDistribution<DataType> counts)
```
    Computes the Gini impurity of a histogram. For each item in the histogram, it is the probability that it is randomly assigned to the wrong category, given the frequency of the different categories. This is computed by looping over all the categories and multiplying the fraction of elements in that category (f_i) times the probability of choosing a different category (1 - f_i). That is: sum_i f_i * (1 - f_i)
    
    Type Parameters:
    
    DataType - The type of data the counts are over.
    
    Parameters:
    
    counts - The distribution to compute the impurity over.
    
    Returns:
    
    The Gini impurity of the given distribution.

Class VectorThresholdGiniImpurityLearner<OutputType>

Field Summary

Fields inherited from class gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner

Constructor Summary

Method Summary

Methods inherited from class gov.sandia.cognition.learning.algorithm.tree.AbstractVectorThresholdMaximumGainLearner

Methods inherited from class java.lang.Object

Constructor Detail

VectorThresholdGiniImpurityLearner

VectorThresholdGiniImpurityLearner

Method Detail

clone

computeSplitGain

giniImpurity