DataType - The type of the data to cluster. This is typically defined
by the divergence function used.@PublicationReference(author="Jeff Piersol", title="Parallel Mini-Batch k-means Clustering", type=Conference, year=2016, publication="to appear", url="to appear") public class MiniBatchKMeansClusterer<DataType extends Vector> extends KMeansClusterer<Vector,MiniBatchCentroidCluster> implements Randomized
| Modifier and Type | Class and Description |
|---|---|
static class |
MiniBatchKMeansClusterer.Builder<DataType extends Vector>
Can be used to create custom
MiniBatchKMeansClusterers without
using the big constructor. |
| Modifier and Type | Field and Description |
|---|---|
protected java.util.List<java.lang.Integer> |
dataIndices
Indices of the data.
|
static int |
DEFAULT_MAX_ITERATIONS
The default maximum number of iterations is 100000.
|
protected java.util.Random |
random
The random number generator to use for initialization and subset
selection.
|
assignments, clusterCounts, clusters, DEFAULT_NUM_REQUESTED_CLUSTERS, divergenceFunction, initializer, numRequestedClustersdata, keepGoingmaxIterationsDEFAULT_ITERATION, iteration| Constructor and Description |
|---|
MiniBatchKMeansClusterer(int numClusters)
Create a clusterer with the default parameters.
|
MiniBatchKMeansClusterer(int numClusters,
int maxIterations,
FixedClusterInitializer<MiniBatchCentroidCluster,Vector> initializer,
Semimetric<? super Vector> metric,
ClusterCreator<MiniBatchCentroidCluster,Vector> creator,
java.util.Random random)
Creates a new
MiniBatchKMeansClusterer. |
| Modifier and Type | Method and Description |
|---|---|
protected int[] |
assignDataToClusters(java.util.Collection<? extends Vector> data)
Creates the cluster assignments given the current locations of clusters
|
protected void |
cleanupAlgorithm()
Called to clean up the learning algorithm's state after learning has
finished.
|
MiniBatchKMeansClusterer<DataType> |
clone()
This makes public the clone method on the
Object class and
removes the exception that it throws. |
java.util.List<? extends DataType> |
getData()
Gets the data to use for learning.
|
int |
getMinibatchSize()
Get the size of the mini-batches used.
|
java.util.Random |
getRandom()
Gets the random number generator used by this object.
|
double |
getStoppingCriterion()
Get the stopping criterion for this clusterer.
|
protected boolean |
initializeAlgorithm()
Called to initialize the learning algorithm's state based on the
data that is stored in the data field.
|
protected void |
saveFinalClustering()
Saves the final clustering for each data point.
|
void |
setData(java.util.Collection<? extends Vector> data)
Set the data to be clustered.
|
void |
setMinibatchSize(int minibatchSize)
Set the size of the mini-batches.
|
void |
setRandom(java.util.Random random)
Sets the random number generator used by this object.
|
void |
setStoppingCriterion(double stoppingCriterion)
Set the stopping criterion for this clusterer.
|
protected boolean |
step()
Do a step of the clustering algorithm.
|
assignDataFromIndices, createClustersFromAssignments, getAssignments, getClosestClusterIndex, getCluster, getClusterCounts, getClusters, getCreator, getDivergenceFunction, getInitializer, getNumChanged, getNumClusters, getNumElements, getNumRequestedClusters, getPerformance, getResult, setAssignment, setClusters, setCreator, setDivergenceFunction, setInitializer, setNumChanged, setNumRequestedClustersgetKeepGoing, learn, setKeepGoing, stopgetMaxIterations, isResultValid, setMaxIterationsaddIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListenersequals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitlearngetMaxIterations, setMaxIterationsaddIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListenerisResultValidpublic static final int DEFAULT_MAX_ITERATIONS
protected java.util.Random random
protected java.util.List<java.lang.Integer> dataIndices
public MiniBatchKMeansClusterer(int numClusters)
numClusters - the number of clusters to outputpublic MiniBatchKMeansClusterer(int numClusters,
int maxIterations,
FixedClusterInitializer<MiniBatchCentroidCluster,Vector> initializer,
Semimetric<? super Vector> metric,
ClusterCreator<MiniBatchCentroidCluster,Vector> creator,
java.util.Random random)
MiniBatchKMeansClusterer.numClusters - the number of clusters to createmaxIterations - the number of iterations before stoppinginitializer - sets the initial centroidsmetric - the metric to usecreator - the cluster creator to userandom - the random number generator to usepublic MiniBatchKMeansClusterer<DataType> clone()
AbstractCloneableSerializableObject class and
removes the exception that it throws. Its default behavior is to
automatically create a clone of the exact type of object that the
clone is called on and to copy all primitives but to keep all references,
which means it is a shallow copy.
Extensions of this class may want to override this method (but call
super.clone() to implement a "smart copy". That is, to target
the most common use case for creating a copy of the object. Because of
the default behavior being a shallow copy, extending classes only need
to handle fields that need to have a deeper copy (or those that need to
be reset). Some of the methods in ObjectUtil may be helpful in
implementing a custom clone method.
Note: The contract of this method is that you must use
super.clone() as the basis for your implementation.clone in interface CloneableSerializableclone in class KMeansClusterer<Vector,MiniBatchCentroidCluster>protected boolean initializeAlgorithm()
AbstractAnytimeBatchLearnerinitializeAlgorithm in class KMeansClusterer<Vector,MiniBatchCentroidCluster>protected boolean step()
step in class KMeansClusterer<Vector,MiniBatchCentroidCluster>protected void saveFinalClustering()
protected void cleanupAlgorithm()
AbstractAnytimeBatchLearnercleanupAlgorithm in class KMeansClusterer<Vector,MiniBatchCentroidCluster>public java.util.Random getRandom()
RandomizedgetRandom in interface Randomizedpublic final void setRandom(java.util.Random random)
RandomizedsetRandom in interface Randomizedrandom - The random number generator for this object to use.public java.util.List<? extends DataType> getData()
AnytimeBatchLearnergetData in interface AnytimeBatchLearner<java.util.Collection<? extends Vector>,java.util.Collection<MiniBatchCentroidCluster>>getData in class AbstractAnytimeBatchLearner<java.util.Collection<? extends Vector>,java.util.Collection<MiniBatchCentroidCluster>>public void setData(java.util.Collection<? extends Vector> data)
RandomAccess List, it will be copied into one.setData in class KMeansClusterer<Vector,MiniBatchCentroidCluster>data - public double getStoppingCriterion()
setStoppingCriterion(double) for details on the criterion.public void setStoppingCriterion(double stoppingCriterion)
stoppingCriterion - if the fraction of samples that changed
assignment is lower than this number, iteration stops. Set this to a
negative value to disable early stoppingpublic int getMinibatchSize()
public void setMinibatchSize(int minibatchSize)
minibatchSize - protected int[] assignDataToClusters(java.util.Collection<? extends Vector> data)
KMeansClustererassignDataToClusters in class KMeansClusterer<Vector,MiniBatchCentroidCluster>data - Data to assign