Cluster#

This subpackage provides classes to perform the actual clustering.

Different clustering algorithms correspond to different subclasses of the base class clusterking.cluster.Cluster (and inherit all of its methods).

Currently implemented:

HierarchyCluster: Hierarchical clustering (https://en.wikipedia.org/wiki/Hierarchical_clustering/)
KmeansCluster: Kmeans clustering (https://en.wikipedia.org/wiki/K-means_clustering/)

`Cluster`#

class clusterking.cluster.Cluster[source]#

Bases: clusterking.worker.DataWorker

Abstract baseclass of the Cluster classes. This class is subclassed to implement specific clustering algorithms and defines common functions.

__init__()[source]#

Parameters: data – Data object

md#: Metadata

abstract run(data, **kwargs)[source]#: Implementation of the clustering. Should return an array-like object with the cluster number.

class clusterking.cluster.ClusterResult(data, md, clusters)[source]#

Bases: clusterking.result.DataResult

__init__(data, md, clusters)[source]#

get_clusters(indexed=False)[source]#

write(cluster_column='cluster')[source]#: Write results back in the Data object.

`HierarchyCluster`#

class clusterking.cluster.HierarchyCluster[source]#

Bases: clusterking.cluster.cluster.Cluster

__init__()[source]#

property max_d: Optional[float]#: Cutoff value set in set_max_d().

property metric: Callable#: Metric that was set in set_metric() (Function that takes Data object as only parameter and returns a reduced distance matrix.)

set_metric(*args, **kwargs) → None[source]#

Select a metric in one of the following ways:

If no positional arguments are given, we choose the euclidean metric.
If the first positional argument is string, we pick one of the metrics that are defined in scipy.spatical.distance.pdist by that name (all additional arguments will be past to this function).
If the first positional argument is a function, we take this function (and add all additional arguments to it).

Examples:

...(): Euclidean metric
...("euclidean"): Also Euclidean metric
...(lambda data: scipy.spatial.distance.pdist(data.data(), 'euclidean'): Also Euclidean metric
...("minkowski", p=2): Minkowsky distance with p=2.

See https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html for more information.

Parameters

*args – see description above
**kwargs – see description above

Returns

Function that takes Data object as only parameter and returns a reduced distance matrix.

set_hierarchy_options(method='complete', optimal_ordering=False)[source]#

Configure hierarchy building

Parameters

method – See reference on scipy.cluster.hierarchy.linkage
optimal_ordering – See reference on scipy.cluster.hierarchy.linkage

set_max_d(max_d) → None[source]#

Set the cutoff value of the hierarchy that then gives the clusters. This corresponds to the t argument of scipy.cluster.hierarchy.fcluster.

Parameters: max_d – float
Returns: None

set_fcluster_options(**kwargs) → None[source]#

Set additional keyword options for our call to scipy.cluster.hierarchy.fcluster.

Parameters: kwargs – Keyword arguments
Returns: None

run(data, reuse_hierarchy_from: Optional[clusterking.cluster.hierarchy_cluster.HierarchyClusterResult] = None)[source]#

Parameters

data –
reuse_hierarchy_from – Reuse the hierarchy from a HierarchyClusterResult object.

Returns:

class clusterking.cluster.HierarchyClusterResult(data, md, clusters, hierarchy, worker_id)[source]#

Bases: clusterking.cluster.cluster.ClusterResult

__init__(data, md, clusters, hierarchy, worker_id)[source]#

property hierarchy#

property worker_id#: ID of the HierarchyCluster worker that generated this object.

property data_id: int#: ID of the data object that the HierarchyCluster worker was run on.

dendrogram(output: Union[None, str, pathlib.Path] = None, ax=None, show=False, **kwargs)[source]#

Creates dendrogram

Parameters

output – If supplied, we save the dendrogram there
ax – An axes object if you want to add the dendrogram to an existing axes rather than creating a new one
show – If true, the dendrogram is shown in a viewer.
**kwargs – Additional keyword options to scipy.cluster.hierarchy.dendrogram

Returns

The matplotlib.pyplot.Axes object

`KmeansCluster`#

class clusterking.cluster.KmeansCluster[source]#

Bases: clusterking.cluster.cluster.Cluster

Kmeans clustering (wikipedia) as implemented in sklearn.cluster.

Example:

import clusterking as ck
d = ck.Data("/path/to/data.sql")    # Load some data
c = ck.cluster.KmeansCluster()      # Init worker class
c.set_kmeans_options(n_clusters=5)  # Set options for clustering
r = c.run(d)                        # Perform clustering on data
r.write()                           # Write results back to data

__init__()[source]#

set_kmeans_options(**kwargs) → None[source]#

Configure clustering algorithms.

Parameters: **kwargs – Keyword arguments to sklearn.cluster.KMeans().

run(data) → clusterking.cluster.kmeans_cluster.KmeansClusterResult[source]#

class clusterking.cluster.KmeansClusterResult(data, md, clusters)[source]#: Bases: clusterking.cluster.cluster.ClusterResult

ClusterKinG 1.1.0 documentation

Cluster

Contents

Cluster#

`Cluster`#

`HierarchyCluster`#

`KmeansCluster`#

ClusterKinG 1.1.0 documentation

Cluster

Contents

Cluster#

Cluster#

HierarchyCluster#

KmeansCluster#

`Cluster`#

`HierarchyCluster`#

`KmeansCluster`#