Cluster¶

This subpackage provides classes to perform the actual clustering.

Different clustering algorithms correspond to different subclasses of the base class clusterking.cluster.Cluster (and inherit all of its methods).

Currently implemented:

HierarchyCluster: Hierarchical clustering (https://en.wikipedia.org/wiki/Hierarchical_clustering/)
KmeansCluster: Kmeans clustering (https://en.wikipedia.org/wiki/K-means_clustering/)

`Cluster`¶

class clusterking.cluster.Cluster[source]¶

Bases: clusterking.worker.DataWorker

Abstract baseclass of the Cluster classes. This class is subclassed to implement specific clustering algorithms and defines common functions.

__init__()[source]¶

Parameters: data – Data object

md = None¶

Metadata

run(data, **kwargs)[source]¶

Implementation of the clustering. Should return an array-like object with the cluster number.

class clusterking.cluster.ClusterResult(data, md, clusters)[source]¶

Bases: clusterking.result.DataResult

__init__(data, md, clusters)[source]¶

get_clusters(indexed=False)[source]¶

write(cluster_column='cluster')[source]¶

Write results back in the Data object.

`HierarchyCluster`¶

class clusterking.cluster.HierarchyCluster[source]¶

Bases: clusterking.cluster.cluster.Cluster

__init__()[source]¶

max_d¶

Cutoff value set in set_max_d().

metric¶

Metric that was set in set_metric() (Function that takes Data object as only parameter and returns a reduced distance matrix.)

set_metric(*args, **kwargs) → None[source]¶

Select a metric in one of the following ways:

If no positional arguments are given, we choose the euclidean metric.

If the first positional argument is string, we pick one of the metrics that are defined in scipy.spatical.distance.pdist by that name (all additional arguments will be past to this function).

If the first positional argument is a function, we take this function (and add all additional arguments to it).

Examples:

...(): Euclidean metric

...("euclidean"): Also Euclidean metric

...(lambda data: scipy.spatial.distance.pdist(data.data(), 'euclidean'): Also Euclidean metric

...("minkowski", p=2): Minkowsky distance with p=2.

See https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html for more information.

Parameters:

*args – see description above

**kwargs – see description above

Returns:
Function that takes Data object as only parameter and returns a reduced distance matrix.

set_hierarchy_options(method='complete', optimal_ordering=False)[source]¶

Configure hierarchy building

Parameters:

method – See reference on scipy.cluster.hierarchy.linkage

optimal_ordering – See reference on scipy.cluster.hierarchy.linkage

set_max_d(max_d) → None[source]¶

Set the cutoff value of the hierarchy that then gives the clusters. This corresponds to the t argument of scipy.cluster.hierarchy.fcluster.

Parameters: max_d – float

Returns: None

set_fcluster_options(**kwargs) → None[source]¶

Set additional keyword options for our call to scipy.cluster.hierarchy.fcluster.

Parameters: kwargs – Keyword arguments

Returns: None

run(data, reuse_hierarchy_from: Optional[clusterking.cluster.hierarchy_cluster.HierarchyClusterResult] = None)[source]¶

Parameters:

data –

reuse_hierarchy_from – Reuse the hierarchy from a HierarchyClusterResult object.

Returns:

class clusterking.cluster.HierarchyClusterResult(data, md, clusters, hierarchy, worker_id)[source]¶

Bases: clusterking.cluster.cluster.ClusterResult

__init__(data, md, clusters, hierarchy, worker_id)[source]¶

hierarchy¶

worker_id¶

ID of the HierarchyCluster worker that generated this object.

data_id¶

ID of the data object that the HierarchyCluster worker was run on.

dendrogram(output: Union[None, str, pathlib.Path] = None, ax=None, show=False, **kwargs)[source]¶

Creates dendrogram

Parameters:

output – If supplied, we save the dendrogram there

ax – An axes object if you want to add the dendrogram to an existing axes rather than creating a new one

show – If true, the dendrogram is shown in a viewer.

**kwargs – Additional keyword options to scipy.cluster.hierarchy.dendrogram

Returns:
The matplotlib.pyplot.Axes object

`KmeansCluster`¶

class clusterking.cluster.KmeansCluster[source]¶
Bases: clusterking.cluster.cluster.Cluster

Kmeans clustering (wikipedia) as implemented in sklearn.cluster.

Example:
import clusterking as ck
d = ck.Data("/path/to/data.sql")    # Load some data
c = ck.cluster.KmeansCluster()      # Init worker class
c.set_kmeans_options(n_clusters=5)  # Set options for clustering
r = c.run(d)                        # Perform clustering on data
r.write()                           # Write results back to data
__init__()[source]¶

set_kmeans_options(**kwargs) → None[source]¶

Configure clustering algorithms.

Parameters: **kwargs – Keyword arguments to sklearn.cluster.KMeans().

run(data) → clusterking.cluster.kmeans_cluster.KmeansClusterResult[source]¶
class clusterking.cluster.KmeansClusterResult(data, md, clusters)[source]¶

Bases: clusterking.cluster.cluster.ClusterResult

Read the Docs v: stable

Versions: latest; stable; development

Downloads: pdf; html; epub

On Read the Docs: Project Home; Builds

Free document hosting provided by Read the Docs.