OutputType
- The output type of the data.public class VectorThresholdInformationGainLearner<OutputType> extends AbstractVectorThresholdMaximumGainLearner<OutputType> implements PriorWeightedNodeLearner<OutputType>
VectorThresholdInformationGainLearner
computes the best
threshold over a dataset of vectors using information gain to determine the
optimal index and threshold. This is an implementation of what is used in
the C4.5 decision tree algorithm.
Modifier and Type | Field and Description |
---|---|
protected java.util.ArrayList<OutputType> |
categories
The categories for the prior.
|
protected int[] |
categoryCounts
The counts for each category.
|
protected double[] |
categoryPriors
The priors for each category.
|
protected double[] |
categoryProbabilities
Following is scratch space used when computing weighted
entropy.
|
DEFAULT_MIN_SPLIT_SIZE, dimensionsToConsider, minSplitSize
Constructor and Description |
---|
VectorThresholdInformationGainLearner()
Creates a new instance of VectorDeciderLearner.
|
VectorThresholdInformationGainLearner(int minSplitSize)
Creates a new
VectorThresholdInformationGainLearner . |
Modifier and Type | Method and Description |
---|---|
VectorThresholdInformationGainLearner<OutputType> |
clone()
This makes public the clone method on the
Object class and
removes the exception that it throws. |
double |
computeSplitGain(DefaultDataDistribution<OutputType> baseCounts,
DefaultDataDistribution<OutputType> positiveCounts,
DefaultDataDistribution<OutputType> negativeCounts)
Computes the gain of a given split.
|
void |
configure(java.util.Map<OutputType,java.lang.Double> priors,
java.util.Map<OutputType,java.lang.Integer> trainCounts)
Configure the node learner with prior weights and training counts.
|
computeBestGainAndThreshold, computeBestGainAndThreshold, getDimensionsToConsider, getMinSplitSize, learn, setDimensionsToConsider, setMinSplitSize
protected java.util.ArrayList<OutputType> categories
protected double[] categoryPriors
protected int[] categoryCounts
protected double[] categoryProbabilities
public VectorThresholdInformationGainLearner()
public VectorThresholdInformationGainLearner(int minSplitSize)
VectorThresholdInformationGainLearner
.minSplitSize
- The minimum split size. Must be positive.public VectorThresholdInformationGainLearner<OutputType> clone()
AbstractCloneableSerializable
Object
class and
removes the exception that it throws. Its default behavior is to
automatically create a clone of the exact type of object that the
clone is called on and to copy all primitives but to keep all references,
which means it is a shallow copy.
Extensions of this class may want to override this method (but call
super.clone()
to implement a "smart copy". That is, to target
the most common use case for creating a copy of the object. Because of
the default behavior being a shallow copy, extending classes only need
to handle fields that need to have a deeper copy (or those that need to
be reset). Some of the methods in ObjectUtil
may be helpful in
implementing a custom clone method.
Note: The contract of this method is that you must use
super.clone()
as the basis for your implementation.clone
in interface CloneableSerializable
clone
in class AbstractVectorThresholdMaximumGainLearner<OutputType>
public double computeSplitGain(DefaultDataDistribution<OutputType> baseCounts, DefaultDataDistribution<OutputType> positiveCounts, DefaultDataDistribution<OutputType> negativeCounts)
AbstractVectorThresholdMaximumGainLearner
computeSplitGain
in class AbstractVectorThresholdMaximumGainLearner<OutputType>
baseCounts
- The base category information before splitting. Contains the sum of
the positive and negative counts.positiveCounts
- The category information on the positive side of the split.negativeCounts
- The category information on the negative side of the split.public void configure(java.util.Map<OutputType,java.lang.Double> priors, java.util.Map<OutputType,java.lang.Integer> trainCounts)
PriorWeightedNodeLearner
configure
in interface PriorWeightedNodeLearner<OutputType>
priors
- Prior weights for each of the possible output values (i.e.,
the categories for the prediction task). If null, the
method will estimate default priors from the training
counts.trainCounts
- Frequency counts of the possible output values (i.e.,
categories) in the training data. This parameter should
always be specified.