DBSCANClusterer (Cognitive Foundry)

java.lang.Object
- gov.sandia.cognition.util.AbstractCloneableSerializable
- - gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
  - - gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm<ResultType>
    - - gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType>>
      - gov.sandia.cognition.learning.algorithm.clustering.DBSCANClusterer<DataType,ClusterType>

Type Parameters:

DataType - The type of the data to cluster. This is typically defined by the metric used.

ClusterType - The type of Cluster created by the algorithm. This is typically defined by the cluster creator function used.

All Implemented Interfaces:

AnytimeAlgorithm<java.util.Collection<ClusterType>>, IterativeAlgorithm, StoppableAlgorithm, AnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType>>, BatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType>>, BatchClusterer<DataType,ClusterType>, CloneableSerializable, java.io.Serializable, java.lang.Cloneable
```
@PublicationReference(author={"Martin Ester","Hans-Peter Kriegel","Jiirg Sander","Xiaowei Xu"},
                      title="A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.",
                      type=Journal,
                      publication="AAAI Press",
                      pages=-5,
                      year=1996,
                      url="https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf")
public class DBSCANClusterer<DataType extends Vectorizable,ClusterType extends Cluster<DataType>>
extends AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType>,java.util.Collection<ClusterType>>
implements BatchClusterer<DataType,ClusterType>
```
The DBSCAN algorithm requires three parameters: a distance metric, a value for neighborhood radius, and a value for the minimum number of surrounding neighbors for a point to be considered non-noise. It clusters by iterating point-by-point and grouping points that are close together (in the same neighborhood). Points that are not in any neighborhood are labeled as noise. Noise points are grouped into the first resultant cluster.

This implementation conditionally uses a KD tree to store the data points and perform efficient queries for neighborhoods. The KD tree is only used when the metric is a Metric (not a Semimetric like CosineDistanceMetric). When one of these conditions is not met, neighborhood querying has O(n) complexity, giving the overall algorithm O(n^2) time complexity. If a KD tree is used, queries have O(logn) complexity, giving the overall algorithm O(n logn) complexity.

Since:

4.0.0

Author:

Quinn McNamara

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`static double`	`DEFAULT_EPS` The default eps is 0.5.
`static int`	`DEFAULT_MAX_ITERATIONS` The default maximum number of iterations 2147483647
`static int`	`DEFAULT_MIN_SAMPLES` The default minimum samples is 5.

Fields inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
data, keepGoing

Fields inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
maxIterations

Fields inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
DEFAULT_ITERATION, iteration

Constructor Summary

Constructors
Constructor and Description
`DBSCANClusterer(double eps, int minSamples, Semimetric<? super DataType> metric, ClusterCreator<ClusterType,DataType> creator)` Creates a new instance of AffinityPropagation.
`DBSCANClusterer(Semimetric<? super DataType> metric, ClusterCreator<ClusterType,DataType> creator)` Creates a new instance of DBSCANClusterer.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected void`	`cleanupAlgorithm()` Called to clean up the learning algorithm's state after learning has finished.
`DBSCANClusterer<DataType,ClusterType>`	`clone()` This makes public the clone method on the `Object` class and removes the exception that it throws.
`ClusterType`	`getCluster(int i)` Get the cluster at this index.
`int`	`getClusterCount()` Gets the number of clusters.
`protected java.util.ArrayList<ClusterType>`	`getClusters()` Gets the current clusters, which is a sparse mapping of exemplar identifier to cluster object.
`ClusterCreator<ClusterType,DataType>`	`getCreator()` Gets the cluster creator.
`Semimetric<? super DataType>`	`getMetric()` Gets the distance metric the clustering uses.
`double`	`getMinSamples()` Gets the minimum number of samples.
`double`	`getNeighborhoodRadius()` Gets the neighborhood radius.
`int`	`getPointIndex()` Gets the point index.
`java.util.ArrayList<DataType>`	`getPoints()` Gets the list of points.
`java.util.ArrayList<ClusterType>`	`getResult()` Gets the current result of the algorithm.
`KDTree<DataType,java.lang.Double,InputOutputPair<DataType,java.lang.Double>>`	`getSpatialIndex()` Gets the spatial index.
`protected boolean`	`initializeAlgorithm()` Called to initialize the learning algorithm's state based on the data that is stored in the data field.
`void`	`setClusterCount(int count)` Sets the number of clusters.
`protected void`	`setClusters(java.util.ArrayList<ClusterType> clusters)` Sets the current clusters, which is a sparse mapping of exemplar identifier to cluster object.
`void`	`setCreator(ClusterCreator<ClusterType,DataType> creator)` Sets the cluster creator.
`void`	`setCreator(KDTree<DataType,java.lang.Double,InputOutputPair<DataType,java.lang.Double>> spatialIndex)` Sets the spatial index.
`void`	`setMetric(Semimetric<? super DataType> metric)` Sets the distance metric the clustering uses.
`void`	`setMinSamples(int minSamples)` Sets the minimum number of samples.
`void`	`setNeighborhoodRadius(double eps)` Sets the neighborhood radius.
`void`	`setPointIndex(int index)` Sets the point index.
`void`	`setPoints(java.util.ArrayList<DataType> points)` Sets the list of points.
`protected boolean`	`step()` Called to take a single step of the learning algorithm.

Methods inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner
getData, getKeepGoing, learn, setData, setKeepGoing, stop

Methods inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm
getMaxIterations, isResultValid, setMaxIterations

Methods inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm
addIterativeAlgorithmListener, fireAlgorithmEnded, fireAlgorithmStarted, fireStepEnded, fireStepStarted, getIteration, getListeners, removeIterativeAlgorithmListener, setIteration, setListeners

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface gov.sandia.cognition.learning.algorithm.BatchLearner
learn

Methods inherited from interface gov.sandia.cognition.algorithm.AnytimeAlgorithm
getMaxIterations, setMaxIterations

Methods inherited from interface gov.sandia.cognition.algorithm.IterativeAlgorithm
addIterativeAlgorithmListener, getIteration, removeIterativeAlgorithmListener

Methods inherited from interface gov.sandia.cognition.algorithm.StoppableAlgorithm
isResultValid

- Field Detail
  - DEFAULT_EPS
```
public static final double DEFAULT_EPS
```
    The default eps is 0.5.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_MIN_SAMPLES
```
public static final int DEFAULT_MIN_SAMPLES
```
    The default minimum samples is 5.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_MAX_ITERATIONS
```
public static final int DEFAULT_MAX_ITERATIONS
```
    The default maximum number of iterations 2147483647
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - DBSCANClusterer
```
public DBSCANClusterer(Semimetric<? super DataType> metric,
                       ClusterCreator<ClusterType,DataType> creator)
```
    Creates a new instance of DBSCANClusterer.
    
    Parameters:
    
    metric -
    
    creator -
  - DBSCANClusterer
```
public DBSCANClusterer(double eps,
                       int minSamples,
                       Semimetric<? super DataType> metric,
                       ClusterCreator<ClusterType,DataType> creator)
```
    Creates a new instance of AffinityPropagation.
    
    Parameters:
    
    eps - The divergence function to use to determine the divergence between two examples.
    
    minSamples - The value for self-divergence to use, which controls the number of clusters created.
    
    metric - The damping factor (lambda). Must be between 0.0 and 1.0.
    
    creator - The cluster creator.
- Method Detail
  - clone
```
public DBSCANClusterer<DataType,ClusterType> clone()
```
    Description copied from class: AbstractCloneableSerializable
    
    This makes public the clone method on the Object class and removes the exception that it throws. Its default behavior is to automatically create a clone of the exact type of object that the clone is called on and to copy all primitives but to keep all references, which means it is a shallow copy. Extensions of this class may want to override this method (but call super.clone() to implement a "smart copy". That is, to target the most common use case for creating a copy of the object. Because of the default behavior being a shallow copy, extending classes only need to handle fields that need to have a deeper copy (or those that need to be reset). Some of the methods in ObjectUtil may be helpful in implementing a custom clone method. Note: The contract of this method is that you must use super.clone() as the basis for your implementation.
    
    Specified by:
    
    clone in interface CloneableSerializable
    
    Overrides:
    
    clone in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType extends Vectorizable>,java.util.Collection<ClusterType extends Cluster<DataType>>>
    
    Returns:
    
    A clone of this object.
  - initializeAlgorithm
```
protected boolean initializeAlgorithm()
```
    Description copied from class: AbstractAnytimeBatchLearner
    
    Called to initialize the learning algorithm's state based on the data that is stored in the data field. The return value indicates if the algorithm can be run or not based on the initialization.
    
    Specified by:
    
    initializeAlgorithm in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType extends Vectorizable>,java.util.Collection<ClusterType extends Cluster<DataType>>>
    
    Returns:
    
    True if the learning algorithm can be run and false if it cannot.
  - step
```
protected boolean step()
```
    Description copied from class: AbstractAnytimeBatchLearner
    
    Called to take a single step of the learning algorithm.
    
    Specified by:
    
    step in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType extends Vectorizable>,java.util.Collection<ClusterType extends Cluster<DataType>>>
    
    Returns:
    
    True if another step can be taken and false it the algorithm should halt.
  - cleanupAlgorithm
```
protected void cleanupAlgorithm()
```
    Description copied from class: AbstractAnytimeBatchLearner
    
    Called to clean up the learning algorithm's state after learning has finished.
    
    Specified by:
    
    cleanupAlgorithm in class AbstractAnytimeBatchLearner<java.util.Collection<? extends DataType extends Vectorizable>,java.util.Collection<ClusterType extends Cluster<DataType>>>
  - getResult
```
public java.util.ArrayList<ClusterType> getResult()
```
    Description copied from interface: AnytimeAlgorithm
    
    Gets the current result of the algorithm.
    
    Specified by:
    
    getResult in interface AnytimeAlgorithm<java.util.Collection<ClusterType extends Cluster<DataType>>>
    
    Returns:
    
    Current result of the algorithm.
  - getNeighborhoodRadius
```
public double getNeighborhoodRadius()
```
    Gets the neighborhood radius.
    
    Returns:
    
    The eps.
  - setNeighborhoodRadius
```
public void setNeighborhoodRadius(double eps)
```
    Sets the neighborhood radius.
    
    Parameters:
    
    eps - The eps.
  - getMinSamples
```
public double getMinSamples()
```
    Gets the minimum number of samples.
    
    Returns:
    
    The minSamples.
  - setMinSamples
```
public void setMinSamples(int minSamples)
```
    Sets the minimum number of samples.
    
    Parameters:
    
    minSamples - The minSamples.
  - getMetric
```
public Semimetric<? super DataType> getMetric()
```
    Gets the distance metric the clustering uses.
    
    Returns:
    
    The metric.
  - setMetric
```
public void setMetric(Semimetric<? super DataType> metric)
```
    Sets the distance metric the clustering uses.
    
    Parameters:
    
    metric - The metric.
  - getClusters
```
protected java.util.ArrayList<ClusterType> getClusters()
```
    Gets the current clusters, which is a sparse mapping of exemplar identifier to cluster object.
    
    Returns:
    
    The current clusters.
  - getCluster
```
public ClusterType getCluster(int i)
```
    Get the cluster at this index.
    
    Parameters:
    
    i - The index of the cluster.
    
    Returns:
  - setClusters
```
protected void setClusters(java.util.ArrayList<ClusterType> clusters)
```
    Sets the current clusters, which is a sparse mapping of exemplar identifier to cluster object.
    
    Parameters:
    
    clusters - The current clusters.
  - getCreator
```
public ClusterCreator<ClusterType,DataType> getCreator()
```
    Gets the cluster creator.
    
    Returns:
    
    The cluster creator.
  - setCreator
```
public void setCreator(ClusterCreator<ClusterType,DataType> creator)
```
    Sets the cluster creator.
    
    Parameters:
    
    creator - The creator for clusters.
  - getSpatialIndex
```
public KDTree<DataType,java.lang.Double,InputOutputPair<DataType,java.lang.Double>> getSpatialIndex()
```
    Gets the spatial index.
    
    Returns:
    
    The spatial index.
  - setCreator
```
public void setCreator(KDTree<DataType,java.lang.Double,InputOutputPair<DataType,java.lang.Double>> spatialIndex)
```
    Sets the spatial index.
    
    Parameters:
    
    spatialIndex - The spatial index (speeds up neighborhood queries).
  - getPoints
```
public java.util.ArrayList<DataType> getPoints()
```
    Gets the list of points.
    
    Returns:
    
    The points being clustered.
  - setPoints
```
public void setPoints(java.util.ArrayList<DataType> points)
```
    Sets the list of points.
    
    Parameters:
    
    points - The points to be clustered.
  - getClusterCount
```
public int getClusterCount()
```
    Gets the number of clusters.
    
    Returns:
    
    The number of clusters.
  - setClusterCount
```
public void setClusterCount(int count)
```
    Sets the number of clusters.
    
    Parameters:
    
    count - The number of clusters.
  - getPointIndex
```
public int getPointIndex()
```
    Gets the point index.
    
    Returns:
    
    The point index.
  - setPointIndex
```
public void setPointIndex(int index)
```
    Sets the point index.
    
    Parameters:
    
    index - The point index.

Class DBSCANClusterer<DataType extends Vectorizable,ClusterType extends Cluster<DataType>>

Field Summary

Fields inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner

Fields inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm

Fields inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm

Constructor Summary

Method Summary

Methods inherited from class gov.sandia.cognition.learning.algorithm.AbstractAnytimeBatchLearner

Methods inherited from class gov.sandia.cognition.algorithm.AbstractAnytimeAlgorithm

Methods inherited from class gov.sandia.cognition.algorithm.AbstractIterativeAlgorithm

Methods inherited from class java.lang.Object

Methods inherited from interface gov.sandia.cognition.learning.algorithm.BatchLearner

Methods inherited from interface gov.sandia.cognition.algorithm.AnytimeAlgorithm

Methods inherited from interface gov.sandia.cognition.algorithm.IterativeAlgorithm

Methods inherited from interface gov.sandia.cognition.algorithm.StoppableAlgorithm

Field Detail

DEFAULT_EPS

DEFAULT_MIN_SAMPLES

DEFAULT_MAX_ITERATIONS

Constructor Detail

DBSCANClusterer

DBSCANClusterer

Method Detail

clone

initializeAlgorithm

step

cleanupAlgorithm

getResult

getNeighborhoodRadius

setNeighborhoodRadius

getMinSamples

setMinSamples

getMetric

setMetric

getClusters

getCluster

setClusters

getCreator

setCreator

getSpatialIndex

setCreator

getPoints

setPoints

getClusterCount

setClusterCount

getPointIndex

setPointIndex