OutputType - The output type of the data.public class VectorThresholdInformationGainLearner<OutputType> extends AbstractVectorThresholdMaximumGainLearner<OutputType> implements PriorWeightedNodeLearner<OutputType>
VectorThresholdInformationGainLearner computes the best
threshold over a dataset of vectors using information gain to determine the
optimal index and threshold. This is an implementation of what is used in
the C4.5 decision tree algorithm.
| Modifier and Type | Field and Description |
|---|---|
protected java.util.ArrayList<OutputType> |
categories
The categories for the prior.
|
protected int[] |
categoryCounts
The counts for each category.
|
protected double[] |
categoryPriors
The priors for each category.
|
protected double[] |
categoryProbabilities
Following is scratch space used when computing weighted
entropy.
|
DEFAULT_MIN_SPLIT_SIZE, dimensionsToConsider, minSplitSize| Constructor and Description |
|---|
VectorThresholdInformationGainLearner()
Creates a new instance of VectorDeciderLearner.
|
VectorThresholdInformationGainLearner(int minSplitSize)
Creates a new
VectorThresholdInformationGainLearner. |
| Modifier and Type | Method and Description |
|---|---|
VectorThresholdInformationGainLearner<OutputType> |
clone()
This makes public the clone method on the
Object class and
removes the exception that it throws. |
double |
computeSplitGain(DefaultDataDistribution<OutputType> baseCounts,
DefaultDataDistribution<OutputType> positiveCounts,
DefaultDataDistribution<OutputType> negativeCounts)
Computes the gain of a given split.
|
void |
configure(java.util.Map<OutputType,java.lang.Double> priors,
java.util.Map<OutputType,java.lang.Integer> trainCounts)
Configure the node learner with prior weights and training counts.
|
computeBestGainAndThreshold, computeBestGainAndThreshold, getDimensionsToConsider, getMinSplitSize, learn, setDimensionsToConsider, setMinSplitSizeprotected java.util.ArrayList<OutputType> categories
protected double[] categoryPriors
protected int[] categoryCounts
protected double[] categoryProbabilities
public VectorThresholdInformationGainLearner()
public VectorThresholdInformationGainLearner(int minSplitSize)
VectorThresholdInformationGainLearner.minSplitSize - The minimum split size. Must be positive.public VectorThresholdInformationGainLearner<OutputType> clone()
AbstractCloneableSerializableObject class and
removes the exception that it throws. Its default behavior is to
automatically create a clone of the exact type of object that the
clone is called on and to copy all primitives but to keep all references,
which means it is a shallow copy.
Extensions of this class may want to override this method (but call
super.clone() to implement a "smart copy". That is, to target
the most common use case for creating a copy of the object. Because of
the default behavior being a shallow copy, extending classes only need
to handle fields that need to have a deeper copy (or those that need to
be reset). Some of the methods in ObjectUtil may be helpful in
implementing a custom clone method.
Note: The contract of this method is that you must use
super.clone() as the basis for your implementation.clone in interface CloneableSerializableclone in class AbstractVectorThresholdMaximumGainLearner<OutputType>public double computeSplitGain(DefaultDataDistribution<OutputType> baseCounts, DefaultDataDistribution<OutputType> positiveCounts, DefaultDataDistribution<OutputType> negativeCounts)
AbstractVectorThresholdMaximumGainLearnercomputeSplitGain in class AbstractVectorThresholdMaximumGainLearner<OutputType>baseCounts - The base category information before splitting. Contains the sum of
the positive and negative counts.positiveCounts - The category information on the positive side of the split.negativeCounts - The category information on the negative side of the split.public void configure(java.util.Map<OutputType,java.lang.Double> priors, java.util.Map<OutputType,java.lang.Integer> trainCounts)
PriorWeightedNodeLearnerconfigure in interface PriorWeightedNodeLearner<OutputType>priors - Prior weights for each of the possible output values (i.e.,
the categories for the prediction task). If null, the
method will estimate default priors from the training
counts.trainCounts - Frequency counts of the possible output values (i.e.,
categories) in the training data. This parameter should
always be specified.