public class VectorThresholdVarianceLearner extends AbstractCloneableSerializable implements VectorThresholdLearner<java.lang.Double>
VectorThresholdVarianceLearner
computes the best threshold over
a dataset of vectors using the reduction in variance to determine the
optimal index and threshold. This is an implementation of what is used in
the CART regression tree algorithm.Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MIN_SPLIT_SIZE
The default value for the minimum split size is 1.
|
protected int[] |
dimensionsToConsider
The array of 0-based dimensions to consider in the input.
|
protected int |
minSplitSize
The threshold for allowing a split to be made, determined by how many
instances fall in each left or right sides of the split.
|
Constructor and Description |
---|
VectorThresholdVarianceLearner()
Creates a new
VectorThresholdVarianceLearner . |
VectorThresholdVarianceLearner(int minSplitSize)
Creates a new
VectorThresholdVarianceLearner |
VectorThresholdVarianceLearner(int minSplitSize,
int... dimensionsToConsider)
Creates a new
VectorThresholdVarianceLearner . |
Modifier and Type | Method and Description |
---|---|
DefaultPair<java.lang.Double,java.lang.Double> |
computeBestGainThreshold(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,java.lang.Double>> data,
int dimension,
double baseVariance)
Computes the best information gain-threshold pair for the given
dimension on the given data.
|
int[] |
getDimensionsToConsider()
Gets the dimensions that the learner is to consider.
|
int |
getMinSplitSize()
Gets the minimum split size.
|
VectorElementThresholdCategorizer |
learn(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,java.lang.Double>> data)
Learns a VectorElementThresholdCategorizer from the given data by
picking the vector element and threshold that best maximizes information
gain.
|
void |
setDimensionsToConsider(int... dimensionsToConsider)
Gets the dimensions that the learner is to consider.
|
void |
setMinSplitSize(int minSplitSize)
Sets the minimum split size.
|
clone
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
clone
public static final int DEFAULT_MIN_SPLIT_SIZE
protected int minSplitSize
protected int[] dimensionsToConsider
public VectorThresholdVarianceLearner()
VectorThresholdVarianceLearner
.public VectorThresholdVarianceLearner(int minSplitSize)
VectorThresholdVarianceLearner
minSplitSize
- The minimum split size. Must be positive.public VectorThresholdVarianceLearner(int minSplitSize, int... dimensionsToConsider)
VectorThresholdVarianceLearner
.minSplitSize
- The minimum split size. Must be positive.dimensionsToConsider
- The array of vector dimensions to consider. Null means all of them
are considered.public VectorElementThresholdCategorizer learn(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,java.lang.Double>> data)
learn
in interface BatchLearner<java.util.Collection<? extends InputOutputPair<? extends Vectorizable,java.lang.Double>>,VectorElementThresholdCategorizer>
data
- The data to learn from.public DefaultPair<java.lang.Double,java.lang.Double> computeBestGainThreshold(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,java.lang.Double>> data, int dimension, double baseVariance)
data
- The data to use.dimension
- The dimension to compute the best threshold over.baseVariance
- The variance of the data.public int[] getDimensionsToConsider()
DimensionFilterableLearner
getDimensionsToConsider
in interface DimensionFilterableLearner
public void setDimensionsToConsider(int... dimensionsToConsider)
DimensionFilterableLearner
setDimensionsToConsider
in interface DimensionFilterableLearner
dimensionsToConsider
- The array of vector dimensions to consider. Null means all of them
are considered.public int getMinSplitSize()
public void setMinSplitSize(int minSplitSize)
minSplitSize
- The minimum split size. Must be positive.