DataType
- The type of the data to cluster. This is typically defined
by the metric used.ClusterType
- The type of Cluster
created by the algorithm.
This is typically defined by the cluster creator function used.@PublicationReference(author={"Martin Ester","Hans-Peter Kriegel","Jiirg Sander","Xiaowei Xu"}, title="A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.", type=Journal, publication="AAAI Press", pages=-5, year=1996, url="https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf") public class DBSCANClusterer<DataType extends Vectorizable,ClusterType extends Cluster<DataType>> extends AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType>> implements BatchClusterer<DataType,ClusterType>
DBSCAN
algorithm requires three parameters: a distance
metric, a value for neighborhood radius, and a value for the minimum number
of surrounding neighbors for a point to be considered non-noise. It clusters
by iterating point-by-point and grouping points that are close together (in
the same neighborhood). Points that are not in any neighborhood are labeled
as noise. Noise points are grouped into the first resultant cluster.
Metric
(not a Semimetric
like
CosineDistanceMetric). When one of these conditions is not met, neighborhood
querying has O(n) complexity, giving the overall algorithm O(n^2) time
complexity. If a KD tree is used, queries have O(logn) complexity, giving the
overall algorithm O(n logn) complexity.Modifier and Type | Field and Description |
---|---|
static double |
DEFAULT_EPS
The default eps is 0.5.
|
static int |
DEFAULT_MAX_ITERATIONS
The default maximum number of iterations 2147483647
|
static int |
DEFAULT_MIN_SAMPLES
The default minimum samples is 5.
|
data, keepGoing
maxIterations
DEFAULT_ITERATION, iteration
Constructor and Description |
---|
DBSCANClusterer(double eps,
int minSamples,
Semimetric<? super DataType> metric,
ClusterCreator<ClusterType,DataType> creator)
Creates a new instance of AffinityPropagation.
|
DBSCANClusterer(Semimetric<? super DataType> metric,
ClusterCreator<ClusterType,DataType> creator)
Creates a new instance of DBSCANClusterer.
|
Modifier and Type | Method and Description |
---|---|
protected void |
cleanupAlgorithm()
Called to clean up the learning algorithm's state after learning has
finished.
|
DBSCANClusterer<DataType,ClusterType> |
clone()
This makes public the clone method on the
Object class and
removes the exception that it throws. |
ClusterType |
getCluster(int i)
Get the cluster at this index.
|
int |
getClusterCount()
Gets the number of clusters.
|
protected java.util.ArrayList<ClusterType> |
getClusters()
Gets the current clusters, which is a sparse mapping of exemplar
identifier to cluster object.
|
ClusterCreator<ClusterType,DataType> |
getCreator()
Gets the cluster creator.
|
Semimetric<? super DataType> |
getMetric()
Gets the distance metric the clustering uses.
|
double |
getMinSamples()
Gets the minimum number of samples.
|
double |
getNeighborhoodRadius()
Gets the neighborhood radius.
|
int |
getPointIndex()
Gets the point index.
|
java.util.ArrayList<DataType> |
getPoints()
Gets the list of points.
|
java.util.ArrayList<ClusterType> |
getResult()
Gets the current result of the algorithm.
|
KDTree<DataType,java.lang.Double,InputOutputPair<DataType,java.lang.Double>> |
getSpatialIndex()
Gets the spatial index.
|
protected boolean |
initializeAlgorithm()
Called to initialize the learning algorithm's state based on the
data that is stored in the data field.
|
void |
setClusterCount(int count)
Sets the number of clusters.
|
protected void |
setClusters(java.util.ArrayList<ClusterType> clusters)
Sets the current clusters, which is a sparse mapping of exemplar
identifier to cluster object.
|
void |
setCreator(ClusterCreator<ClusterType,DataType> creator)
Sets the cluster creator.
|
void |
setCreator(KDTree<DataType,java.lang.Double,InputOutputPair<DataType,java.lang.Double>> spatialIndex)
Sets the spatial index.
|
void |
setMetric(Semimetric<? super DataType> metric)
Sets the distance metric the clustering uses.
|
void |
setMinSamples(int minSamples)
Sets the minimum number of samples.
|
void |
setNeighborhoodRadius(double eps)
Sets the neighborhood radius.
|
void |
setPointIndex(int index)
Sets the point index.
|
void |
setPoints(java.util.ArrayList<DataType> points)
Sets the list of points.
|
protected boolean |
step()
Called to take a single step of the learning algorithm.
|
getData, getKeepGoing, learn, setData, setKeepGoing, stop
getMaxIterations, isResultValid, setMaxIterations
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
learn
getMaxIterations, setMaxIterations
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener
isResultValid
public static final double DEFAULT_EPS
public static final int DEFAULT_MIN_SAMPLES
public static final int DEFAULT_MAX_ITERATIONS
public DBSCANClusterer(Semimetric<? super DataType> metric, ClusterCreator<ClusterType,DataType> creator)
metric
- creator
- public DBSCANClusterer(double eps, int minSamples, Semimetric<? super DataType> metric, ClusterCreator<ClusterType,DataType> creator)
eps
- The divergence function to use to determine the divergence
between two examples.minSamples
- The value for self-divergence to use, which controls
the number of clusters created.metric
- The damping factor (lambda). Must be between 0.0 and 1.0.creator
- The cluster creator.public DBSCANClusterer<DataType,ClusterType> clone()
AbstractCloneableSerializable
Object
class and
removes the exception that it throws. Its default behavior is to
automatically create a clone of the exact type of object that the
clone is called on and to copy all primitives but to keep all references,
which means it is a shallow copy.
Extensions of this class may want to override this method (but call
super.clone()
to implement a "smart copy". That is, to target
the most common use case for creating a copy of the object. Because of
the default behavior being a shallow copy, extending classes only need
to handle fields that need to have a deeper copy (or those that need to
be reset). Some of the methods in ObjectUtil
may be helpful in
implementing a custom clone method.
Note: The contract of this method is that you must use
super.clone()
as the basis for your implementation.clone
in interface CloneableSerializable
clone
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType extends Vectorizable>,java.util.Collection<ClusterType extends Cluster<DataType>>>
protected boolean initializeAlgorithm()
AbstractAnytimeBatchLearner
initializeAlgorithm
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType extends Vectorizable>,java.util.Collection<ClusterType extends Cluster<DataType>>>
protected boolean step()
AbstractAnytimeBatchLearner
step
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType extends Vectorizable>,java.util.Collection<ClusterType extends Cluster<DataType>>>
protected void cleanupAlgorithm()
AbstractAnytimeBatchLearner
cleanupAlgorithm
in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType extends Vectorizable>,java.util.Collection<ClusterType extends Cluster<DataType>>>
public java.util.ArrayList<ClusterType> getResult()
AnytimeAlgorithm
getResult
in interface AnytimeAlgorithm<java.util.Collection<ClusterType extends Cluster<DataType>>>
public double getNeighborhoodRadius()
public void setNeighborhoodRadius(double eps)
eps
- The eps.public double getMinSamples()
public void setMinSamples(int minSamples)
minSamples
- The minSamples.public Semimetric<? super DataType> getMetric()
public void setMetric(Semimetric<? super DataType> metric)
metric
- The metric.protected java.util.ArrayList<ClusterType> getClusters()
public ClusterType getCluster(int i)
i
- The index of the cluster.protected void setClusters(java.util.ArrayList<ClusterType> clusters)
clusters
- The current clusters.public ClusterCreator<ClusterType,DataType> getCreator()
public void setCreator(ClusterCreator<ClusterType,DataType> creator)
creator
- The creator for clusters.public KDTree<DataType,java.lang.Double,InputOutputPair<DataType,java.lang.Double>> getSpatialIndex()
public void setCreator(KDTree<DataType,java.lang.Double,InputOutputPair<DataType,java.lang.Double>> spatialIndex)
spatialIndex
- The spatial index (speeds up neighborhood queries).public java.util.ArrayList<DataType> getPoints()
public void setPoints(java.util.ArrayList<DataType> points)
points
- The points to be clustered.public int getClusterCount()
public void setClusterCount(int count)
count
- The number of clusters.public int getPointIndex()
public void setPointIndex(int index)
index
- The point index.