DataType
- The type of the data to cluster. This is typically defined
by the divergence function used.ClusterType
- The type of Cluster
created by the algorithm.
This is typically defined by the cluster creator function used.@CodeReview(reviewer="Kevin R. Dixon",date="2008-10-06",changesNeeded=true,comments={"The constructors for this class are not user friendly.","I\'ve been trying to write a test GUI for k-means for over an hour and STILL can\'t figure out the combination of classes to configure the constructor.","Please make a constructor that configures the class with meaningful, user-friendly default arguments."}) @CodeReview(reviewer="Kevin R. Dixon",date="2008-07-22",changesNeeded=false,comments={"Changed the condition to be \'members.size() > 0\' instead of 1 in createClustersFromAssignments()","Cleaned up javadoc.","Code generally looks fine."}) @PublicationReference(author="Wikipedia",title="K-means algorithm",type=WebPage,year=2008,url="http://en.wikipedia.org/wiki/K-means_algorithm") @PublicationReference(author="Matteo Matteucci",title="A Tutorial on Clustering Algorithms: k-means Demo",type=WebPage,year=2008,url="http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html") public class KMeansClusterer<DataType,ClusterType extends Cluster<DataType>> extends AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType>> implements BatchClusterer<DataType,ClusterType>, MeasurablePerformanceAlgorithm, DivergenceFunctionContainer<ClusterType,DataType>
KMeansClusterer
class implements the standard k-means
(k-centroids) clustering algorithm.Modifier and Type | Field and Description |
---|---|
protected int[] |
assignments
The current assignments of elements to clusters.
|
protected int[] |
clusterCounts
The current number of elements assigned to each cluster.
|
protected java.util.ArrayList<ClusterType> |
clusters
The current set of clusters.
|
static int |
DEFAULT_MAX_ITERATIONS
The default maximum number of iterations is 1000.
|
static int |
DEFAULT_NUM_REQUESTED_CLUSTERS
The default number of requested clusters is 10.
|
protected ClusterDivergenceFunction<? super ClusterType,? super DataType> |
divergenceFunction
The divergence function between cluster being used.
|
protected FixedClusterInitializer<ClusterType,DataType> |
initializer
The initializer for the algorithm.
|
protected int |
numRequestedClusters
The number of clusters requested.
|
data, keepGoing
maxIterations
DEFAULT_ITERATION, iteration
Constructor and Description |
---|
KMeansClusterer()
Creates a new instance of
KMeansClusterer with default
parameters. |
KMeansClusterer(int numRequestedClusters,
int maxIterations,
FixedClusterInitializer<ClusterType,DataType> initializer,
ClusterDivergenceFunction<? super ClusterType,? super DataType> divergenceFunction,
ClusterCreator<ClusterType,DataType> creator)
Creates a new instance of KMeansClusterer using the given parameters.
|
Modifier and Type | Method and Description |
---|---|
protected java.util.ArrayList<java.util.ArrayList<DataType>> |
assignDataFromIndices()
Puts the data into a list of lists for each cluster to then estimate
|
protected int[] |
assignDataToClusters(java.util.Collection<? extends DataType> data)
Creates the cluster assignments given the current locations of clusters
|
protected void |
cleanupAlgorithm()
Called to clean up the learning algorithm's state after learning has
finished.
|
KMeansClusterer<DataType,ClusterType> |
clone()
This makes public the clone method on the
Object class and
removes the exception that it throws. |
protected void |
createClustersFromAssignments()
Creates the set of clusters using the current cluster assignments.
|
protected int[] |
getAssignments()
Getter for assignments
|
protected int |
getClosestClusterIndex(DataType element)
Gets the index of the closest cluster for the given element.
|
protected ClusterType |
getCluster(int index)
Gets the cluster for the given index.
|
protected int[] |
getClusterCounts()
Getter for clusterCounts
|
java.util.ArrayList<ClusterType> |
getClusters()
Getter for clusters
|
ClusterCreator<ClusterType,DataType> |
getCreator()
Gets the cluster creator.
|
ClusterDivergenceFunction<? super ClusterType,? super DataType> |
getDivergenceFunction()
Gets the divergence function used in clustering.
|
FixedClusterInitializer<ClusterType,DataType> |
getInitializer()
Gets the cluster initializer.
|
int |
getNumChanged()
Getter for numChanged
|
protected int |
getNumClusters()
Gets the actual number of clusters that were created.
|
int |
getNumElements()
Returns the number of elements
|
int |
getNumRequestedClusters()
Gets the number of clusters that were requested.
|
NamedValue<java.lang.Integer> |
getPerformance()
Gets the performance, which is the number changed on the last iteration.
|
java.util.ArrayList<ClusterType> |
getResult()
Gets the current result of the algorithm.
|
protected boolean |
initializeAlgorithm()
Called to initialize the learning algorithm's state based on the
data that is stored in the data field.
|
protected boolean |
setAssignment(int elementIndex,
int newClusterIndex)
Sets the assignment of the given element to the new cluster index,
updating the cluster counts as well.
|
protected void |
setClusters(java.util.ArrayList<ClusterType> clusters)
Sets the clusters.
|
void |
setCreator(ClusterCreator<ClusterType,DataType> creator)
Sets the cluster creator.
|
void |
setData(java.util.Collection<? extends DataType> data)
Gets the data to use for learning.
|
void |
setDivergenceFunction(ClusterDivergenceFunction<? super ClusterType,? super DataType> divergenceFunction)
Sets the divergence function.
|
void |
setInitializer(FixedClusterInitializer<ClusterType,DataType> initializer)
Sets the cluster initializer.
|
protected void |
setNumChanged(int numChanged)
Setter for numChanged
|
void |
setNumRequestedClusters(int numRequestedClusters)
Sets the number of requested clusters.
|
protected boolean |
step()
Do a step of the clustering algorithm.
|
getData, getKeepGoing, learn, setKeepGoing, stop
getMaxIterations, isResultValid, setMaxIterations
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
learn
getMaxIterations, setMaxIterations
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
isResultValid
public static final int DEFAULT_NUM_REQUESTED_CLUSTERS
public static final int DEFAULT_MAX_ITERATIONS
protected int numRequestedClusters
protected FixedClusterInitializer<ClusterType extends Cluster<DataType>,DataType> initializer
protected ClusterDivergenceFunction<? super ClusterType extends Cluster<DataType>,? super DataType> divergenceFunction
protected java.util.ArrayList<ClusterType extends Cluster<DataType>> clusters
protected int[] assignments
protected int[] clusterCounts
public KMeansClusterer()
KMeansClusterer
with default
parameters.public KMeansClusterer(int numRequestedClusters, int maxIterations, FixedClusterInitializer<ClusterType,DataType> initializer, ClusterDivergenceFunction<? super ClusterType,? super DataType> divergenceFunction, ClusterCreator<ClusterType,DataType> creator)
numRequestedClusters
- The number of clusters requested (k).maxIterations
- Maximum number of iterations before stoppinginitializer
- The initializer for the clusters.divergenceFunction
- The divergence function.creator
- The cluster creator.public KMeansClusterer<DataType,ClusterType> clone()
AbstractCloneableSerializable
Object
class and
removes the exception that it throws. Its default behavior is to
automatically create a clone of the exact type of object that the
clone is called on and to copy all primitives but to keep all references,
which means it is a shallow copy.
Extensions of this class may want to override this method (but call
super.clone()
to implement a "smart copy". That is, to target
the most common use case for creating a copy of the object. Because of
the default behavior being a shallow copy, extending classes only need
to handle fields that need to have a deeper copy (or those that need to
be reset). Some of the methods in ObjectUtil
may be helpful in
implementing a custom clone method.
Note: The contract of this method is that you must use
super.clone()
as the basis for your implementation.clone
in interface CloneableSerializable
clone
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType extends Cluster<DataType>>>
protected boolean initializeAlgorithm()
AbstractAnytimeBatchLearner
initializeAlgorithm
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType extends Cluster<DataType>>>
protected boolean step()
step
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType extends Cluster<DataType>>>
protected void cleanupAlgorithm()
AbstractAnytimeBatchLearner
cleanupAlgorithm
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType extends Cluster<DataType>>>
protected int[] assignDataToClusters(java.util.Collection<? extends DataType> data)
data
- Data to assignpublic void setData(java.util.Collection<? extends DataType> data)
AbstractAnytimeBatchLearner
setData
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType extends Cluster<DataType>>>
data
- The data to use for learning.protected java.util.ArrayList<java.util.ArrayList<DataType>> assignDataFromIndices()
protected void createClustersFromAssignments()
protected int getClosestClusterIndex(DataType element)
element
- The element to get the closet cluster for.protected boolean setAssignment(int elementIndex, int newClusterIndex)
elementIndex
- The index of the element.newClusterIndex
- The new cluster the element is assigned to.protected ClusterType getCluster(int index)
index
- The index of the cluster.protected int getNumClusters()
public int getNumRequestedClusters()
public FixedClusterInitializer<ClusterType,DataType> getInitializer()
public ClusterDivergenceFunction<? super ClusterType,? super DataType> getDivergenceFunction()
getDivergenceFunction
in interface DivergenceFunctionContainer<ClusterType extends Cluster<DataType>,DataType>
public ClusterCreator<ClusterType,DataType> getCreator()
public void setNumRequestedClusters(int numRequestedClusters)
numRequestedClusters
- The number of requested clusters.public void setInitializer(FixedClusterInitializer<ClusterType,DataType> initializer)
initializer
- The cluster initializer.public void setDivergenceFunction(ClusterDivergenceFunction<? super ClusterType,? super DataType> divergenceFunction)
divergenceFunction
- The divergence function.public void setCreator(ClusterCreator<ClusterType,DataType> creator)
creator
- The creator for clusters.public int getNumElements()
protected void setClusters(java.util.ArrayList<ClusterType> clusters)
clusters
- The clusters.public java.util.ArrayList<ClusterType> getClusters()
public java.util.ArrayList<ClusterType> getResult()
AnytimeAlgorithm
getResult
in interface AnytimeAlgorithm<java.util.Collection<ClusterType extends Cluster<DataType>>>
protected int[] getAssignments()
protected int[] getClusterCounts()
public int getNumChanged()
protected void setNumChanged(int numChanged)
numChanged
- Returns the number of samples that changed assignment
between iterationspublic NamedValue<java.lang.Integer> getPerformance()
getPerformance
in interface MeasurablePerformanceAlgorithm