DataType - The type of the data to cluster. This is typically defined
by the divergence function used.ClusterType - The type of Cluster created by the algorithm.
This is typically defined by the cluster creator function used.@CodeReview(reviewer="Kevin R. Dixon",date="2008-10-06",changesNeeded=true,comments={"The constructors for this class are not user friendly.","I\'ve been trying to write a test GUI for k-means for over an hour and STILL can\'t figure out the combination of classes to configure the constructor.","Please make a constructor that configures the class with meaningful, user-friendly default arguments."}) @CodeReview(reviewer="Kevin R. Dixon",date="2008-07-22",changesNeeded=false,comments={"Changed the condition to be \'members.size() > 0\' instead of 1 in createClustersFromAssignments()","Cleaned up javadoc.","Code generally looks fine."}) @PublicationReference(author="Wikipedia",title="K-means algorithm",type=WebPage,year=2008,url="http://en.wikipedia.org/wiki/K-means_algorithm") @PublicationReference(author="Matteo Matteucci",title="A Tutorial on Clustering Algorithms: k-means Demo",type=WebPage,year=2008,url="http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html") public class KMeansClusterer<DataType,ClusterType extends Cluster<DataType>> extends AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType>> implements BatchClusterer<DataType,ClusterType>, MeasurablePerformanceAlgorithm, DivergenceFunctionContainer<ClusterType,DataType>
KMeansClusterer class implements the standard k-means
(k-centroids) clustering algorithm.| Modifier and Type | Field and Description |
|---|---|
protected int[] |
assignments
The current assignments of elements to clusters.
|
protected int[] |
clusterCounts
The current number of elements assigned to each cluster.
|
protected java.util.ArrayList<ClusterType> |
clusters
The current set of clusters.
|
static int |
DEFAULT_MAX_ITERATIONS
The default maximum number of iterations is 1000.
|
static int |
DEFAULT_NUM_REQUESTED_CLUSTERS
The default number of requested clusters is 10.
|
protected ClusterDivergenceFunction<? super ClusterType,? super DataType> |
divergenceFunction
The divergence function between cluster being used.
|
protected FixedClusterInitializer<ClusterType,DataType> |
initializer
The initializer for the algorithm.
|
protected int |
numRequestedClusters
The number of clusters requested.
|
data, keepGoingmaxIterationsDEFAULT_ITERATION, iteration| Constructor and Description |
|---|
KMeansClusterer()
Creates a new instance of
KMeansClusterer with default
parameters. |
KMeansClusterer(int numRequestedClusters,
int maxIterations,
FixedClusterInitializer<ClusterType,DataType> initializer,
ClusterDivergenceFunction<? super ClusterType,? super DataType> divergenceFunction,
ClusterCreator<ClusterType,DataType> creator)
Creates a new instance of KMeansClusterer using the given parameters.
|
| Modifier and Type | Method and Description |
|---|---|
protected java.util.ArrayList<java.util.ArrayList<DataType>> |
assignDataFromIndices()
Puts the data into a list of lists for each cluster to then estimate
|
protected int[] |
assignDataToClusters(java.util.Collection<? extends DataType> data)
Creates the cluster assignments given the current locations of clusters
|
protected void |
cleanupAlgorithm()
Called to clean up the learning algorithm's state after learning has
finished.
|
KMeansClusterer<DataType,ClusterType> |
clone()
This makes public the clone method on the
Object class and
removes the exception that it throws. |
protected void |
createClustersFromAssignments()
Creates the set of clusters using the current cluster assignments.
|
protected int[] |
getAssignments()
Getter for assignments
|
protected int |
getClosestClusterIndex(DataType element)
Gets the index of the closest cluster for the given element.
|
protected ClusterType |
getCluster(int index)
Gets the cluster for the given index.
|
protected int[] |
getClusterCounts()
Getter for clusterCounts
|
java.util.ArrayList<ClusterType> |
getClusters()
Getter for clusters
|
ClusterCreator<ClusterType,DataType> |
getCreator()
Gets the cluster creator.
|
ClusterDivergenceFunction<? super ClusterType,? super DataType> |
getDivergenceFunction()
Gets the divergence function used in clustering.
|
FixedClusterInitializer<ClusterType,DataType> |
getInitializer()
Gets the cluster initializer.
|
int |
getNumChanged()
Getter for numChanged
|
protected int |
getNumClusters()
Gets the actual number of clusters that were created.
|
int |
getNumElements()
Returns the number of elements
|
int |
getNumRequestedClusters()
Gets the number of clusters that were requested.
|
NamedValue<java.lang.Integer> |
getPerformance()
Gets the performance, which is the number changed on the last iteration.
|
java.util.ArrayList<ClusterType> |
getResult()
Gets the current result of the algorithm.
|
protected boolean |
initializeAlgorithm()
Called to initialize the learning algorithm's state based on the
data that is stored in the data field.
|
protected boolean |
setAssignment(int elementIndex,
int newClusterIndex)
Sets the assignment of the given element to the new cluster index,
updating the cluster counts as well.
|
protected void |
setClusters(java.util.ArrayList<ClusterType> clusters)
Sets the clusters.
|
void |
setCreator(ClusterCreator<ClusterType,DataType> creator)
Sets the cluster creator.
|
void |
setData(java.util.Collection<? extends DataType> data)
Gets the data to use for learning.
|
void |
setDivergenceFunction(ClusterDivergenceFunction<? super ClusterType,? super DataType> divergenceFunction)
Sets the divergence function.
|
void |
setInitializer(FixedClusterInitializer<ClusterType,DataType> initializer)
Sets the cluster initializer.
|
protected void |
setNumChanged(int numChanged)
Setter for numChanged
|
void |
setNumRequestedClusters(int numRequestedClusters)
Sets the number of requested clusters.
|
protected boolean |
step()
Do a step of the clustering algorithm.
|
getData, getKeepGoing, learn, setKeepGoing, stopgetMaxIterations, isResultValid, setMaxIterationsaddIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListenersequals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitlearngetMaxIterations, setMaxIterationsaddIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListenerisResultValidpublic static final int DEFAULT_NUM_REQUESTED_CLUSTERS
public static final int DEFAULT_MAX_ITERATIONS
protected int numRequestedClusters
protected FixedClusterInitializer<ClusterType extends Cluster<DataType>,DataType> initializer
protected ClusterDivergenceFunction<? super ClusterType extends Cluster<DataType>,? super DataType> divergenceFunction
protected java.util.ArrayList<ClusterType extends Cluster<DataType>> clusters
protected int[] assignments
protected int[] clusterCounts
public KMeansClusterer()
KMeansClusterer with default
parameters.public KMeansClusterer(int numRequestedClusters,
int maxIterations,
FixedClusterInitializer<ClusterType,DataType> initializer,
ClusterDivergenceFunction<? super ClusterType,? super DataType> divergenceFunction,
ClusterCreator<ClusterType,DataType> creator)
numRequestedClusters - The number of clusters requested (k).maxIterations - Maximum number of iterations before stoppinginitializer - The initializer for the clusters.divergenceFunction - The divergence function.creator - The cluster creator.public KMeansClusterer<DataType,ClusterType> clone()
AbstractCloneableSerializableObject class and
removes the exception that it throws. Its default behavior is to
automatically create a clone of the exact type of object that the
clone is called on and to copy all primitives but to keep all references,
which means it is a shallow copy.
Extensions of this class may want to override this method (but call
super.clone() to implement a "smart copy". That is, to target
the most common use case for creating a copy of the object. Because of
the default behavior being a shallow copy, extending classes only need
to handle fields that need to have a deeper copy (or those that need to
be reset). Some of the methods in ObjectUtil may be helpful in
implementing a custom clone method.
Note: The contract of this method is that you must use
super.clone() as the basis for your implementation.clone in interface CloneableSerializableclone in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType extends Cluster<DataType>>>protected boolean initializeAlgorithm()
AbstractAnytimeBatchLearnerinitializeAlgorithm in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType extends Cluster<DataType>>>protected boolean step()
step in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType extends Cluster<DataType>>>protected void cleanupAlgorithm()
AbstractAnytimeBatchLearnercleanupAlgorithm in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType extends Cluster<DataType>>>protected int[] assignDataToClusters(java.util.Collection<? extends DataType> data)
data - Data to assignpublic void setData(java.util.Collection<? extends DataType> data)
AbstractAnytimeBatchLearnersetData in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType extends Cluster<DataType>>>data - The data to use for learning.protected java.util.ArrayList<java.util.ArrayList<DataType>> assignDataFromIndices()
protected void createClustersFromAssignments()
protected int getClosestClusterIndex(DataType element)
element - The element to get the closet cluster for.protected boolean setAssignment(int elementIndex,
int newClusterIndex)
elementIndex - The index of the element.newClusterIndex - The new cluster the element is assigned to.protected ClusterType getCluster(int index)
index - The index of the cluster.protected int getNumClusters()
public int getNumRequestedClusters()
public FixedClusterInitializer<ClusterType,DataType> getInitializer()
public ClusterDivergenceFunction<? super ClusterType,? super DataType> getDivergenceFunction()
getDivergenceFunction in interface DivergenceFunctionContainer<ClusterType extends Cluster<DataType>,DataType>public ClusterCreator<ClusterType,DataType> getCreator()
public void setNumRequestedClusters(int numRequestedClusters)
numRequestedClusters - The number of requested clusters.public void setInitializer(FixedClusterInitializer<ClusterType,DataType> initializer)
initializer - The cluster initializer.public void setDivergenceFunction(ClusterDivergenceFunction<? super ClusterType,? super DataType> divergenceFunction)
divergenceFunction - The divergence function.public void setCreator(ClusterCreator<ClusterType,DataType> creator)
creator - The creator for clusters.public int getNumElements()
protected void setClusters(java.util.ArrayList<ClusterType> clusters)
clusters - The clusters.public java.util.ArrayList<ClusterType> getClusters()
public java.util.ArrayList<ClusterType> getResult()
AnytimeAlgorithmgetResult in interface AnytimeAlgorithm<java.util.Collection<ClusterType extends Cluster<DataType>>>protected int[] getAssignments()
protected int[] getClusterCounts()
public int getNumChanged()
protected void setNumChanged(int numChanged)
numChanged - Returns the number of samples that changed assignment
between iterationspublic NamedValue<java.lang.Integer> getPerformance()
getPerformance in interface MeasurablePerformanceAlgorithm