OutputType
- The output category type for the training data.public abstract class AbstractVectorThresholdMaximumGainLearner<OutputType> extends AbstractCloneableSerializable implements VectorThresholdLearner<OutputType>
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MIN_SPLIT_SIZE
The default value for the minimum split size is 1.
|
protected int[] |
dimensionsToConsider
The array of dimensions for the learner to consider.
|
protected int |
minSplitSize
The threshold for allowing a split to be made, determined by how many
instances fall in each left or right sides of the split.
|
Constructor and Description |
---|
AbstractVectorThresholdMaximumGainLearner()
Creates a new
AbstractVectorThresholdMaximumGainLearner . |
AbstractVectorThresholdMaximumGainLearner(int minSplitSize,
int[] dimensionsToConsider)
Creates a new
AbstractVectorThresholdMaximumGainLearner . |
Modifier and Type | Method and Description |
---|---|
AbstractVectorThresholdMaximumGainLearner<OutputType> |
clone()
This makes public the clone method on the
Object class and
removes the exception that it throws. |
DefaultPair<java.lang.Double,java.lang.Double> |
computeBestGainAndThreshold(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data,
int dimension,
DefaultDataDistribution<OutputType> baseCounts)
Computes the best gain and threshold for a given dimension using the
computeSplitGain method for each potential split point of values for the
given dimension.
|
protected DefaultPair<java.lang.Double,java.lang.Double> |
computeBestGainAndThreshold(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data,
int dimension,
DefaultDataDistribution<OutputType> baseCounts,
java.util.ArrayList<DefaultWeightedValue<OutputType>> values)
Computes the best gain and threshold for a given dimension using the
computeSplitGain method for each potential split point of values for the
given dimension.
|
abstract double |
computeSplitGain(DefaultDataDistribution<OutputType> baseCounts,
DefaultDataDistribution<OutputType> positiveCounts,
DefaultDataDistribution<OutputType> negativeCounts)
Computes the gain of a given split.
|
int[] |
getDimensionsToConsider()
Gets the dimensions that the learner is to consider.
|
int |
getMinSplitSize()
Gets the minimum split size.
|
VectorElementThresholdCategorizer |
learn(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data)
The
learn method creates an object of ResultType using
data of type DataType , using some form of "learning" algorithm. |
void |
setDimensionsToConsider(int... dimensionsToConsider)
Gets the dimensions that the learner is to consider.
|
void |
setMinSplitSize(int minSplitSize)
Sets the minimum split size.
|
public static final int DEFAULT_MIN_SPLIT_SIZE
protected int minSplitSize
protected int[] dimensionsToConsider
public AbstractVectorThresholdMaximumGainLearner()
AbstractVectorThresholdMaximumGainLearner
.public AbstractVectorThresholdMaximumGainLearner(int minSplitSize, int[] dimensionsToConsider)
AbstractVectorThresholdMaximumGainLearner
.minSplitSize
- The minimum split size. Must be positive.dimensionsToConsider
- The array of vector dimensions to consider. Null means all of them
are considered.public AbstractVectorThresholdMaximumGainLearner<OutputType> clone()
AbstractCloneableSerializable
Object
class and
removes the exception that it throws. Its default behavior is to
automatically create a clone of the exact type of object that the
clone is called on and to copy all primitives but to keep all references,
which means it is a shallow copy.
Extensions of this class may want to override this method (but call
super.clone()
to implement a "smart copy". That is, to target
the most common use case for creating a copy of the object. Because of
the default behavior being a shallow copy, extending classes only need
to handle fields that need to have a deeper copy (or those that need to
be reset). Some of the methods in ObjectUtil
may be helpful in
implementing a custom clone method.
Note: The contract of this method is that you must use
super.clone()
as the basis for your implementation.clone
in interface CloneableSerializable
clone
in class AbstractCloneableSerializable
public VectorElementThresholdCategorizer learn(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data)
BatchLearner
learn
method creates an object of ResultType
using
data of type DataType
, using some form of "learning" algorithm.learn
in interface BatchLearner<java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>>,VectorElementThresholdCategorizer>
data
- The data that the learning algorithm will use to create an
object of ResultType
.public DefaultPair<java.lang.Double,java.lang.Double> computeBestGainAndThreshold(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data, int dimension, DefaultDataDistribution<OutputType> baseCounts)
data
- The data to use to compute the threshold.dimension
- The dimension to compute the threshold for.baseCounts
- Information about the base category counts.protected DefaultPair<java.lang.Double,java.lang.Double> computeBestGainAndThreshold(java.util.Collection<? extends InputOutputPair<? extends Vectorizable,OutputType>> data, int dimension, DefaultDataDistribution<OutputType> baseCounts, java.util.ArrayList<DefaultWeightedValue<OutputType>> values)
data
- The data to use to compute the threshold.dimension
- The dimension to compute the threshold for.baseCounts
- Information about the base category counts.values
- A workspace to store the values of the data in. Recycled to avoid
recreating a large array each time.public abstract double computeSplitGain(DefaultDataDistribution<OutputType> baseCounts, DefaultDataDistribution<OutputType> positiveCounts, DefaultDataDistribution<OutputType> negativeCounts)
baseCounts
- The base category information before splitting. Contains the sum of
the positive and negative counts.positiveCounts
- The category information on the positive side of the split.negativeCounts
- The category information on the negative side of the split.public int[] getDimensionsToConsider()
DimensionFilterableLearner
getDimensionsToConsider
in interface DimensionFilterableLearner
public void setDimensionsToConsider(int... dimensionsToConsider)
DimensionFilterableLearner
setDimensionsToConsider
in interface DimensionFilterableLearner
dimensionsToConsider
- The array of vector dimensions to consider. Null means all of them
are considered.public int getMinSplitSize()
public void setMinSplitSize(int minSplitSize)
minSplitSize
- The minimum split size. Must be positive.