DataType
- The type of the data to cluster. This is typically defined
by the divergence function used.@PublicationReference(author="Jeff Piersol", title="Parallel Mini-Batch k-means Clustering", type=Conference, year=2016, publication="to appear", url="to appear") public class MiniBatchKMeansClusterer<DataType extends Vector> extends KMeansClusterer<Vector,MiniBatchCentroidCluster> implements Randomized
Modifier and Type | Class and Description |
---|---|
static class |
MiniBatchKMeansClusterer.Builder<DataType extends Vector>
Can be used to create custom
MiniBatchKMeansClusterer s without
using the big constructor. |
Modifier and Type | Field and Description |
---|---|
protected java.util.List<java.lang.Integer> |
dataIndices
Indices of the data.
|
static int |
DEFAULT_MAX_ITERATIONS
The default maximum number of iterations is 100000.
|
protected java.util.Random |
random
The random number generator to use for initialization and subset
selection.
|
assignments, clusterCounts, clusters, DEFAULT_NUM_REQUESTED_CLUSTERS, divergenceFunction, initializer, numRequestedClusters
data, keepGoing
maxIterations
DEFAULT_ITERATION, iteration
Constructor and Description |
---|
MiniBatchKMeansClusterer(int numClusters)
Create a clusterer with the default parameters.
|
MiniBatchKMeansClusterer(int numClusters,
int maxIterations,
FixedClusterInitializer<MiniBatchCentroidCluster,Vector> initializer,
Semimetric<? super Vector> metric,
ClusterCreator<MiniBatchCentroidCluster,Vector> creator,
java.util.Random random)
Creates a new
MiniBatchKMeansClusterer . |
Modifier and Type | Method and Description |
---|---|
protected int[] |
assignDataToClusters(java.util.Collection<? extends Vector> data)
Creates the cluster assignments given the current locations of clusters
|
protected void |
cleanupAlgorithm()
Called to clean up the learning algorithm's state after learning has
finished.
|
MiniBatchKMeansClusterer<DataType> |
clone()
This makes public the clone method on the
Object class and
removes the exception that it throws. |
java.util.List<? extends DataType> |
getData()
Gets the data to use for learning.
|
int |
getMinibatchSize()
Get the size of the mini-batches used.
|
java.util.Random |
getRandom()
Gets the random number generator used by this object.
|
double |
getStoppingCriterion()
Get the stopping criterion for this clusterer.
|
protected boolean |
initializeAlgorithm()
Called to initialize the learning algorithm's state based on the
data that is stored in the data field.
|
protected void |
saveFinalClustering()
Saves the final clustering for each data point.
|
void |
setData(java.util.Collection<? extends Vector> data)
Set the data to be clustered.
|
void |
setMinibatchSize(int minibatchSize)
Set the size of the mini-batches.
|
void |
setRandom(java.util.Random random)
Sets the random number generator used by this object.
|
void |
setStoppingCriterion(double stoppingCriterion)
Set the stopping criterion for this clusterer.
|
protected boolean |
step()
Do a step of the clustering algorithm.
|
assignDataFromIndices, createClustersFromAssignments, getAssignments, getClosestClusterIndex, getCluster, getClusterCounts, getClusters, getCreator, getDivergenceFunction, getInitializer, getNumChanged, getNumClusters, getNumElements, getNumRequestedClusters, getPerformance, getResult, setAssignment, setClusters, setCreator, setDivergenceFunction, setInitializer, setNumChanged, setNumRequestedClusters
getKeepGoing, learn, setKeepGoing, stop
getMaxIterations, isResultValid, setMaxIterations
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
learn
getMaxIterations, setMaxIterations
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
isResultValid
public static final int DEFAULT_MAX_ITERATIONS
protected java.util.Random random
protected java.util.List<java.lang.Integer> dataIndices
public MiniBatchKMeansClusterer(int numClusters)
numClusters
- the number of clusters to outputpublic MiniBatchKMeansClusterer(int numClusters, int maxIterations, FixedClusterInitializer<MiniBatchCentroidCluster,Vector> initializer, Semimetric<? super Vector> metric, ClusterCreator<MiniBatchCentroidCluster,Vector> creator, java.util.Random random)
MiniBatchKMeansClusterer
.numClusters
- the number of clusters to createmaxIterations
- the number of iterations before stoppinginitializer
- sets the initial centroidsmetric
- the metric to usecreator
- the cluster creator to userandom
- the random number generator to usepublic MiniBatchKMeansClusterer<DataType> clone()
AbstractCloneableSerializable
Object
class and
removes the exception that it throws. Its default behavior is to
automatically create a clone of the exact type of object that the
clone is called on and to copy all primitives but to keep all references,
which means it is a shallow copy.
Extensions of this class may want to override this method (but call
super.clone()
to implement a "smart copy". That is, to target
the most common use case for creating a copy of the object. Because of
the default behavior being a shallow copy, extending classes only need
to handle fields that need to have a deeper copy (or those that need to
be reset). Some of the methods in ObjectUtil
may be helpful in
implementing a custom clone method.
Note: The contract of this method is that you must use
super.clone()
as the basis for your implementation.clone
in interface CloneableSerializable
clone
in class KMeansClusterer<Vector,MiniBatchCentroidCluster>
protected boolean initializeAlgorithm()
AbstractAnytimeBatchLearner
initializeAlgorithm
in class KMeansClusterer<Vector,MiniBatchCentroidCluster>
protected boolean step()
step
in class KMeansClusterer<Vector,MiniBatchCentroidCluster>
protected void saveFinalClustering()
protected void cleanupAlgorithm()
AbstractAnytimeBatchLearner
cleanupAlgorithm
in class KMeansClusterer<Vector,MiniBatchCentroidCluster>
public java.util.Random getRandom()
Randomized
getRandom
in interface Randomized
public final void setRandom(java.util.Random random)
Randomized
setRandom
in interface Randomized
random
- The random number generator for this object to use.public java.util.List<? extends DataType> getData()
AnytimeBatchLearner
getData
in interface AnytimeBatchLearner<java.util.Collection<? extends Vector>,java.util.Collection<MiniBatchCentroidCluster>>
getData
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends Vector>,java.util.Collection<MiniBatchCentroidCluster>>
public void setData(java.util.Collection<? extends Vector> data)
RandomAccess
List
, it will be copied into one.setData
in class KMeansClusterer<Vector,MiniBatchCentroidCluster>
data
- public double getStoppingCriterion()
setStoppingCriterion(double)
for details on the criterion.public void setStoppingCriterion(double stoppingCriterion)
stoppingCriterion
- if the fraction of samples that changed
assignment is lower than this number, iteration stops. Set this to a
negative value to disable early stoppingpublic int getMinibatchSize()
public void setMinibatchSize(int minibatchSize)
minibatchSize
- protected int[] assignDataToClusters(java.util.Collection<? extends Vector> data)
KMeansClusterer
assignDataToClusters
in class KMeansClusterer<Vector,MiniBatchCentroidCluster>
data
- Data to assign