InputType
- The type of the input for the categorizer to learn. This is the type
passed to the internal batch learner to learn each ensemble member.CategoryType
- The type of the category that is the output for the categorizer to
learn. It is also passed to the internal batch learner to learn each
ensemble member. It must have a valid equals and hashCode method.@PublicationReference(author="Leo Breiman", title="Pasting small votes for classification in large databases and on-line", year=1999, type=Journal, publication="Machine Learning", pages={85,103}, url="http://www.springerlink.com/content/mnu2r28218651707/fulltext.pdf") public class IVotingCategorizerLearner<InputType,CategoryType> extends AbstractAnytimeSupervisedBatchLearner<InputType,CategoryType,WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>> implements Randomized, BatchLearnerContainer<BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>>>, BagBasedCategorizerEnsembleLearner<InputType,CategoryType>
Modifier and Type | Class and Description |
---|---|
static class |
IVotingCategorizerLearner.OutOfBagErrorStoppingCriteria<InputType,CategoryType>
Implements a stopping criteria for IVoting that uses the out-of-bag
error to determine when to stop learning the ensemble.
|
Modifier and Type | Field and Description |
---|---|
protected Factory<? extends DataDistribution<CategoryType>> |
counterFactory
Factory for counting votes.
|
protected java.util.ArrayList<InputOutputPair<? extends InputType,CategoryType>> |
currentBag
The current bag used to train the current ensemble member.
|
protected java.util.ArrayList<java.lang.Integer> |
currentCorrectIndices
The indices of examples that the ensemble currently gets correct.
|
protected boolean[] |
currentEnsembleCorrect
A boolean for each example indicating if the ensemble currently gets the
example correct or incorrect.
|
protected java.util.ArrayList<java.lang.Integer> |
currentIncorrectIndices
The indices of examples that the ensemble currently gets incorrect.
|
protected Evaluator<? super InputType,? extends CategoryType> |
currentMember
The currently learned member of the ensemble.
|
protected java.util.ArrayList<CategoryType> |
currentMemberEstimates
The estimates of the current member for each example.
|
protected java.util.ArrayList<DataDistribution<CategoryType>> |
dataFullEstimates
The running estimate of the ensemble for each example.
|
protected int[] |
dataInBag
A counter for each example indicating how many times it exists in the
current bag.
|
protected java.util.ArrayList<? extends InputOutputPair<? extends InputType,CategoryType>> |
dataList
The data represented as an array list.
|
protected java.util.ArrayList<DataDistribution<CategoryType>> |
dataOutOfBagEstimates
The running estimate of the ensemble for each example where an ensemble
member can only vote on elements that were not in the bag used to train
it.
|
static int |
DEFAULT_MAX_ITERATIONS
The default maximum number of iterations is 100.
|
static double |
DEFAULT_PERCENT_TO_SAMPLE
The default percent to sample 0.1.
|
static double |
DEFAULT_PROPORTION_INCORRECT_IN_SAMPLE
By default use 50% incorrect (and 50%) correct in the percent to sample.
|
static boolean |
DEFAULT_VOTE_OUT_OF_BAG_ONLY
The default value to vote out-of-bag.
|
protected WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>> |
ensemble
The current ensemble.
|
protected BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> |
learner
The learner used to produce each ensemble member.
|
protected int |
numCorrectToSample
The number of correct examples to sample on each iteration.
|
protected int |
numIncorrectToSample
The number of incorrect examples to sample on each iteration.
|
protected double |
percentToSample
The percent to sample on each iteration.
|
protected double |
proportionIncorrectInSample
The proportion of incorrect examples in each sample.
|
protected java.util.Random |
random
The random number generator to use.
|
protected int |
sampleSize
The size of sample to create on each iteration.
|
protected boolean |
voteOutOfBagOnly
Controls whether or not an ensemble member can vote on items it was
trained on during learning.
|
data, keepGoing
maxIterations
DEFAULT_ITERATION, iteration
Constructor and Description |
---|
IVotingCategorizerLearner()
Creates a new
IVotingCategorizerLearner . |
IVotingCategorizerLearner(BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> learner,
int maxIterations,
double percentToSample,
double proportionIncorrectInSample,
boolean voteOutOfBagOnly,
Factory<? extends DataDistribution<CategoryType>> counterFactory,
java.util.Random random)
Creates a new
IVotingCategorizerLearner . |
IVotingCategorizerLearner(BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> learner,
int maxIterations,
double percentToSample,
java.util.Random random)
Creates a new
IVotingCategorizerLearner . |
Modifier and Type | Method and Description |
---|---|
protected void |
cleanupAlgorithm()
Called to clean up the learning algorithm's state after learning has
finished.
|
protected void |
createBag(java.util.ArrayList<java.lang.Integer> correctIndices,
java.util.ArrayList<java.lang.Integer> incorrectIndices)
Create the next sample (bag) of examples to learn the next ensemble
member from.
|
Factory<? extends DataDistribution<CategoryType>> |
getCounterFactory()
Gets the factory used for creating the object for counting the votes of
the learned ensemble members.
|
boolean[] |
getCurrentEnsembleCorrect()
Gets whether or not the current ensemble gets each example correct.
|
java.util.List<DataDistribution<CategoryType>> |
getDataFullEstimates()
Gets the current estimates for each data point.
|
int[] |
getDataInBag()
Gets the counter for each example indicating how many times it exists
in the current bag.
|
java.util.List<DataDistribution<CategoryType>> |
getDataOutOfBagEstimates()
Gets the current out-of-bag estimates for each data point.
|
InputOutputPair<? extends InputType,CategoryType> |
getExample(int index)
Gets the training example at the given index.
|
BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> |
getLearner()
Gets the learner used to learn each ensemble member.
|
double |
getPercentToSample()
Gets the percentage of the total data to sample on each iteration.
|
double |
getProportionIncorrectInSample()
Gets the proportion of incorrect examples to place in each sample.
|
java.util.Random |
getRandom()
Gets the random number generator used by this object.
|
WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>> |
getResult()
Gets the current result of the algorithm.
|
protected boolean |
initializeAlgorithm()
Called to initialize the learning algorithm's state based on the
data that is stored in the data field.
|
boolean |
isVoteOutOfBagOnly()
Gets whether during learning ensemble members can only vote on items
that they are not in their bag (training set).
|
protected static <DataType> void |
sampleIndicesWithReplacementInto(java.util.ArrayList<java.lang.Integer> fromIndices,
java.util.ArrayList<? extends DataType> baseData,
int numToSample,
java.util.Random random,
java.util.ArrayList<DataType> output,
int[] dataInBag)
Takes the given number of samples from the given list and places them in
the given output list.
|
void |
setCounterFactory(Factory<? extends DataDistribution<CategoryType>> counterFactory)
Sets the factory used for creating the object for counting the votes of
the learned ensemble members.
|
void |
setLearner(BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> learner)
Sets the learner used to learn each ensemble member.
|
void |
setPercentToSample(double percentToSample)
Sets the percentage of the data to sample (with replacement) on each
iteration.
|
void |
setProportionIncorrectInSample(double proportionIncorrectInSample)
Sets the proportion of incorrect examples to place in each sample.
|
void |
setRandom(java.util.Random random)
Sets the random number generator used by this object.
|
void |
setVoteOutOfBagOnly(boolean voteOutOfBagOnly)
Sets whether during learning ensemble members can only vote on items
that they are not in their bag (training set).
|
protected boolean |
step()
Called to take a single step of the learning algorithm.
|
clone, getData, getKeepGoing, learn, setData, setKeepGoing, stop
getMaxIterations, isResultValid, setMaxIterations
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getData, getKeepGoing
getMaxIterations, setMaxIterations
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
isResultValid, stop
learn
clone
public static final int DEFAULT_MAX_ITERATIONS
public static final double DEFAULT_PERCENT_TO_SAMPLE
public static final double DEFAULT_PROPORTION_INCORRECT_IN_SAMPLE
public static final boolean DEFAULT_VOTE_OUT_OF_BAG_ONLY
protected BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> learner
protected double percentToSample
protected double proportionIncorrectInSample
protected boolean voteOutOfBagOnly
protected Factory<? extends DataDistribution<CategoryType>> counterFactory
protected java.util.Random random
protected transient WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>> ensemble
protected transient java.util.ArrayList<? extends InputOutputPair<? extends InputType,CategoryType>> dataList
protected transient java.util.ArrayList<DataDistribution<CategoryType>> dataFullEstimates
protected transient java.util.ArrayList<DataDistribution<CategoryType>> dataOutOfBagEstimates
protected transient boolean[] currentEnsembleCorrect
protected transient java.util.ArrayList<java.lang.Integer> currentCorrectIndices
protected transient java.util.ArrayList<java.lang.Integer> currentIncorrectIndices
protected transient int sampleSize
protected transient int numCorrectToSample
protected transient int numIncorrectToSample
protected transient java.util.ArrayList<InputOutputPair<? extends InputType,CategoryType>> currentBag
protected transient int[] dataInBag
protected transient Evaluator<? super InputType,? extends CategoryType> currentMember
protected transient java.util.ArrayList<CategoryType> currentMemberEstimates
public IVotingCategorizerLearner()
IVotingCategorizerLearner
.public IVotingCategorizerLearner(BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> learner, int maxIterations, double percentToSample, java.util.Random random)
IVotingCategorizerLearner
.learner
- The learner to use to create the categorizer on each iteration.maxIterations
- The maximum number of iterations to run for, which is also the
number of learners to create.percentToSample
- The percentage of the total size of the data to sample on each
iteration. Must be positive.random
- The random number generator to use.public IVotingCategorizerLearner(BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> learner, int maxIterations, double percentToSample, double proportionIncorrectInSample, boolean voteOutOfBagOnly, Factory<? extends DataDistribution<CategoryType>> counterFactory, java.util.Random random)
IVotingCategorizerLearner
.learner
- The learner to use to create the categorizer on each iteration.maxIterations
- The maximum number of iterations to run for, which is also the
number of learners to create.percentToSample
- The percentage of the total size of the data to sample on each
iteration. Must be positive.proportionIncorrectInSample
- The percentage of incorrect examples to put in each sample. Must
be between 0.0 and 1.0 (inclusive).voteOutOfBagOnly
- Controls whether or not in-bag or out-of-bag votes are used to
determine accuracy.counterFactory
- The factory for counting votes.random
- The random number generator to use.protected boolean initializeAlgorithm()
AbstractAnytimeBatchLearner
initializeAlgorithm
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>
protected boolean step()
AbstractAnytimeBatchLearner
step
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>
protected void createBag(java.util.ArrayList<java.lang.Integer> correctIndices, java.util.ArrayList<java.lang.Integer> incorrectIndices)
correctIndices
- The list of indices the ensemble is currently getting correct.incorrectIndices
- The list of indices the ensemble is currently getting incorrect.protected static <DataType> void sampleIndicesWithReplacementInto(java.util.ArrayList<java.lang.Integer> fromIndices, java.util.ArrayList<? extends DataType> baseData, int numToSample, java.util.Random random, java.util.ArrayList<DataType> output, int[] dataInBag)
DataType
- The data type to sample.fromIndices
- The indices into the given base data to sample from.baseData
- The list to sample from using the given list of indices.numToSample
- The number to sample. Must be non-negative.random
- The random number generator to use.output
- The list to add the samples to.dataInBag
- The array of counters for the number of times each example is
sampled.protected void cleanupAlgorithm()
AbstractAnytimeBatchLearner
cleanupAlgorithm
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>
public WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>> getResult()
AnytimeAlgorithm
getResult
in interface AnytimeAlgorithm<WeightedVotingCategorizerEnsemble<InputType,CategoryType,Evaluator<? super InputType,? extends CategoryType>>>
public int[] getDataInBag()
BagBasedCategorizerEnsembleLearner
getDataInBag
in interface BagBasedCategorizerEnsembleLearner<InputType,CategoryType>
public InputOutputPair<? extends InputType,CategoryType> getExample(int index)
BagBasedCategorizerEnsembleLearner
getExample
in interface BagBasedCategorizerEnsembleLearner<InputType,CategoryType>
index
- The 0-based index to lookup.public BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> getLearner()
getLearner
in interface BatchLearnerContainer<BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>>>
public void setLearner(BatchLearner<? super java.util.Collection<? extends InputOutputPair<? extends InputType,CategoryType>>,? extends Evaluator<? super InputType,? extends CategoryType>> learner)
learner
- The learner used for each ensemble member.public double getPercentToSample()
public void setPercentToSample(double percentToSample)
percentToSample
- The percent of the data to sample on each iteration. Must be greater
than zero. Defaults to 100%.public double getProportionIncorrectInSample()
public void setProportionIncorrectInSample(double proportionIncorrectInSample)
proportionIncorrectInSample
- The proportion of incorrect examples in each sample. Must be between
0.0 and 1.0 (inclusive).public boolean isVoteOutOfBagOnly()
public void setVoteOutOfBagOnly(boolean voteOutOfBagOnly)
voteOutOfBagOnly
- If out-of-bag-only voting should be enabled.public Factory<? extends DataDistribution<CategoryType>> getCounterFactory()
public void setCounterFactory(Factory<? extends DataDistribution<CategoryType>> counterFactory)
counterFactory
- The factory used to create the vote counting objects.public java.util.Random getRandom()
Randomized
getRandom
in interface Randomized
public void setRandom(java.util.Random random)
Randomized
setRandom
in interface Randomized
random
- The random number generator for this object to use.public java.util.List<DataDistribution<CategoryType>> getDataFullEstimates()
public java.util.List<DataDistribution<CategoryType>> getDataOutOfBagEstimates()
public boolean[] getCurrentEnsembleCorrect()