DataType
- The type of data the algorithm is to cluster, which it
passes to the divergence function. For example, this could be
Vector
or String
.@CodeReview(reviewer="Kevin R. Dixon", date="2008-07-22", changesNeeded=false, comments={"Removed transient declaration on members.","Fixed a few typos in javadoc.","Added PublicationReference annotation.","Added comment about use of direct-member access.","Code generally looked fine."}) @PublicationReference(author={"Brendan J. Frey","Delbert Dueck"}, title="Clustering by Passing Messages Between Data Points.", type=Journal, publication="Science", notes="Volume 315, number 5814", pages={972,976}, year=2007) public class AffinityPropagation<DataType> extends AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<CentroidCluster<DataType>>> implements BatchClusterer<DataType,CentroidCluster<DataType>>, MeasurablePerformanceAlgorithm, DivergenceFunctionContainer<DataType,DataType>
AffinityPropagation
algorithm requires three parameters:
a divergence function, a value to use for self-divergence, and a damping
factor (called lambda in the paper; 0.5 is the default). It clusters by
passing messages between each point to determine the best exemplar for the
point.
Modifier and Type | Field and Description |
---|---|
protected int[] |
assignments
The assignments of each example to an exemplar (cluster).
|
protected double[][] |
availabilities
The array of example-example availabilities.
|
protected int |
changedCount
The number of examples that have changed assignments in the last
iteration.
|
protected java.util.HashMap<java.lang.Integer,CentroidCluster<DataType>> |
clusters
The clusters that have been found so far.
|
protected double |
dampingFactor
The damping factor (lambda).
|
static double |
DEFAULT_DAMPING_FACTOR
The default damping factor (lambda) is 0.5.
|
static int |
DEFAULT_MAX_ITERATIONS
The default maximum number of iterations is 100.
|
static double |
DEFAULT_SELF_DIVERGENCE
The default self similarity is 0.0.
|
protected DivergenceFunction<? super DataType,? super DataType> |
divergence
The divergence function to use.
|
protected int |
exampleCount
The number of examples.
|
protected java.util.ArrayList<DataType> |
examples
The examples.
|
protected double |
oneMinusDampingFactor
The cached value of one minus the damping factor.
|
protected double[][] |
responsibilities
The array of example-example responsibilities.
|
protected double[][] |
similarities
The array of example-example similarities.
|
data, keepGoing
maxIterations
DEFAULT_ITERATION, iteration
Constructor and Description |
---|
AffinityPropagation()
Creates a new instance of AffinityPropagation.
|
AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence,
double selfDivergence)
Creates a new instance of AffinityPropagation.
|
AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence,
double selfDivergence,
double dampingFactor)
Creates a new instance of AffinityPropagation.
|
AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence,
double selfDivergence,
double dampingFactor,
int maxIterations)
Creates a new instance of AffinityPropagation.
|
Modifier and Type | Method and Description |
---|---|
protected void |
assignCluster(int i,
int newAssignment)
Assigns example "i" to the new cluster index.
|
protected void |
cleanupAlgorithm()
Called to clean up the learning algorithm's state after learning has
finished.
|
AffinityPropagation<DataType> |
clone()
This makes public the clone method on the
Object class and
removes the exception that it throws. |
protected int[] |
getAssignments()
Gets the assignments of examples to exemplars (clusters).
|
protected double[][] |
getAvailabilities()
Gets the availability values.
|
int |
getChangedCount()
Gets the number of cluster assignments that have changed in the most
recent iteration.
|
protected java.util.HashMap<java.lang.Integer,CentroidCluster<DataType>> |
getClusters()
Gets the current clusters, which is a sparse mapping of exemplar
identifier to cluster object.
|
double |
getDampingFactor()
Gets the damping factor.
|
DivergenceFunction<? super DataType,? super DataType> |
getDivergence()
Gets the divergence function used by the algorithm.
|
DivergenceFunction<? super DataType,? super DataType> |
getDivergenceFunction()
Gets the divergence function used by this object.
|
protected java.util.ArrayList<DataType> |
getExamples()
Gets the array list of examples to cluster.
|
NamedValue<java.lang.Integer> |
getPerformance()
Gets the performance, which is the number changed on the last iteration.
|
protected double[][] |
getResponsibilities()
Gets the responsibility values.
|
java.util.ArrayList<CentroidCluster<DataType>> |
getResult()
Gets the current result of the algorithm.
|
double |
getSelfDivergence()
Gets the value used for self-divergence, which controls how many
clusters are generated.
|
protected double[][] |
getSimilarities()
Gets the array of similarities.
|
protected boolean |
initializeAlgorithm()
Called to initialize the learning algorithm's state based on the
data that is stored in the data field.
|
protected void |
setAssignments(int[] assignments)
Sets the assignments of examples to exemplars (clusters).
|
protected void |
setAvailabilities(double[][] availabilities)
Sets the availability values.
|
protected void |
setChangedCount(int changedCount)
Sets the number of cluster assignments that have changed in the most
recent iteration.
|
protected void |
setClusters(java.util.HashMap<java.lang.Integer,CentroidCluster<DataType>> clusters)
Sets the current clusters, which is a sparse mapping of exemplar
identifier to cluster object.
|
void |
setDampingFactor(double dampingFactor)
Sets the damping factor.
|
void |
setDivergence(DivergenceFunction<? super DataType,? super DataType> divergence)
Sets the divergence function used by the algorithm.
|
protected void |
setExamples(java.util.ArrayList<DataType> examples)
Sets the array list of examples to cluster.
|
protected void |
setResponsibilities(double[][] responsibilities)
Sets the responsibility values.
|
void |
setSelfDivergence(double selfDivergence)
Sets the value used for self-divergence, which controls how many
clusters are generated.
|
protected void |
setSimilarities(double[][] similarities)
Sets the array of similarities.
|
protected boolean |
step()
Called to take a single step of the learning algorithm.
|
protected void |
updateAssignments()
Updates the assignments of all the examples to their exemplars (clusters)
using the current availability and responsibility values.
|
protected void |
updateAvailabilities()
Updates the availabilities matrix based on the current responsibility
values.
|
protected void |
updateResponsibilities()
Updates the responsibilities matrix using the similarity values and the
current availability values.
|
getData, getKeepGoing, learn, setData, setKeepGoing, stop
getMaxIterations, isResultValid, setMaxIterations
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
learn
getMaxIterations, setMaxIterations
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
isResultValid
public static final int DEFAULT_MAX_ITERATIONS
public static final double DEFAULT_SELF_DIVERGENCE
public static final double DEFAULT_DAMPING_FACTOR
protected DivergenceFunction<? super DataType,? super DataType> divergence
protected double dampingFactor
protected double oneMinusDampingFactor
protected transient int exampleCount
protected java.util.ArrayList<DataType> examples
protected double[][] similarities
protected double[][] responsibilities
protected double[][] availabilities
protected int[] assignments
protected int changedCount
protected java.util.HashMap<java.lang.Integer,CentroidCluster<DataType>> clusters
public AffinityPropagation()
public AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence, double selfDivergence)
divergence
- The divergence function to use to determine the
divergence between two examples.selfDivergence
- The value for self-divergence to use, which
controls the number of clusters created.public AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence, double selfDivergence, double dampingFactor)
divergence
- The divergence function to use to determine the
divergence between two examples.selfDivergence
- The value for self-divergence to use, which
controls the number of clusters created.dampingFactor
- The damping factor (lambda). Must be between 0.0
and 1.0.public AffinityPropagation(DivergenceFunction<? super DataType,? super DataType> divergence, double selfDivergence, double dampingFactor, int maxIterations)
divergence
- The divergence function to use to determine the
divergence between two examples.selfDivergence
- The value for self-divergence to use, which
controls the number of clusters created.dampingFactor
- The damping factor (lambda). Must be between 0.0
and 1.0.maxIterations
- The maximum number of iterations.public AffinityPropagation<DataType> clone()
AbstractCloneableSerializable
Object
class and
removes the exception that it throws. Its default behavior is to
automatically create a clone of the exact type of object that the
clone is called on and to copy all primitives but to keep all references,
which means it is a shallow copy.
Extensions of this class may want to override this method (but call
super.clone()
to implement a "smart copy". That is, to target
the most common use case for creating a copy of the object. Because of
the default behavior being a shallow copy, extending classes only need
to handle fields that need to have a deeper copy (or those that need to
be reset). Some of the methods in ObjectUtil
may be helpful in
implementing a custom clone method.
Note: The contract of this method is that you must use
super.clone()
as the basis for your implementation.clone
in interface CloneableSerializable
clone
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<CentroidCluster<DataType>>>
protected boolean initializeAlgorithm()
AbstractAnytimeBatchLearner
initializeAlgorithm
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<CentroidCluster<DataType>>>
protected boolean step()
AbstractAnytimeBatchLearner
step
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<CentroidCluster<DataType>>>
protected void updateResponsibilities()
protected void updateAvailabilities()
protected void updateAssignments()
protected void assignCluster(int i, int newAssignment)
i
- The index of the example to assign to the cluster.newAssignment
- The new assignment for "i".protected void cleanupAlgorithm()
AbstractAnytimeBatchLearner
cleanupAlgorithm
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<CentroidCluster<DataType>>>
public java.util.ArrayList<CentroidCluster<DataType>> getResult()
AnytimeAlgorithm
getResult
in interface AnytimeAlgorithm<java.util.Collection<CentroidCluster<DataType>>>
public DivergenceFunction<? super DataType,? super DataType> getDivergence()
public void setDivergence(DivergenceFunction<? super DataType,? super DataType> divergence)
divergence
- The divergence function.public double getSelfDivergence()
public void setSelfDivergence(double selfDivergence)
selfDivergence
- The value for self-divergence.public double getDampingFactor()
public void setDampingFactor(double dampingFactor)
dampingFactor
- The damping factor. Must be between 0.0 and 1.0.protected java.util.ArrayList<DataType> getExamples()
protected void setExamples(java.util.ArrayList<DataType> examples)
examples
- The array list of examples to cluster.protected double[][] getSimilarities()
protected void setSimilarities(double[][] similarities)
similarities
- The array of similarities.protected double[][] getResponsibilities()
protected void setResponsibilities(double[][] responsibilities)
responsibilities
- The responsibilities.protected double[][] getAvailabilities()
protected void setAvailabilities(double[][] availabilities)
availabilities
- The availabilities.protected int[] getAssignments()
protected void setAssignments(int[] assignments)
assignments
- The assignments of examples to exemplars (clusters).public int getChangedCount()
protected void setChangedCount(int changedCount)
changedCount
- The number of changed cluster assignments.protected java.util.HashMap<java.lang.Integer,CentroidCluster<DataType>> getClusters()
protected void setClusters(java.util.HashMap<java.lang.Integer,CentroidCluster<DataType>> clusters)
clusters
- The current clusters.public DivergenceFunction<? super DataType,? super DataType> getDivergenceFunction()
DivergenceFunctionContainer
getDivergenceFunction
in interface DivergenceFunctionContainer<DataType,DataType>
public NamedValue<java.lang.Integer> getPerformance()
getPerformance
in interface MeasurablePerformanceAlgorithm