This module contains worker classes that select representative sample points for each cluster (“benchmark points”).
- class clusterking.benchmark.AbstractBenchmark¶
Subclass this class to implement algorithms to choose benchmark points from all the points (in parameter space) that correspond to one cluster.
St the column of the dataframe of the
Dataobject that contains the cluster information.
- abstract run(data)¶
- class clusterking.benchmark.AbstractBenchmarkResult(data, bpoints, md)¶
- __init__(data, bpoints, md)¶
- class clusterking.benchmark.Benchmark¶
Selecting benchmarks based on a figure of merit that is calculated with the metric. You have to use
set_metric()to specify the metric (as for the
HierarchyClusterclass). The default case for the figure of merit (“sum”) chooses the point as benchmark point that minimizes the sum of all distances to all other points in the same cluster (where “distance” of course is with respect to the metric).
- set_metric(*args, **kwargs) None ¶
Select a metric in one of the following ways:
If no positional arguments are given, we choose the euclidean metric.
If the first positional argument is string, we pick one of the metrics that are defined in
scipy.spatical.distance.pdistby that name (all additional arguments will be past to this function).
If the first positional argument is a function, we take this function (and add all additional arguments to it).
...(): Euclidean metric
...("euclidean"): Also Euclidean metric
...(lambda data: scipy.spatial.distance.pdist(data.data(), 'euclidean'): Also Euclidean metric
...("minkowski", p=2): Minkowsky distance with
See https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html for more information.
*args – see description above
**kwargs – see description above
Function that takes Data object as only parameter and returns a reduced distance matrix.
- set_fom(fct: Callable, *args, **kwargs) None ¶
Set a figure of merit. The default case for the figure of merit ( “sum”) chooses the point as benchmark point that minimizes the sum of all distances to all other points in the same cluster (where “distance” of course is with respect to the metric). In general we choose the point that minimizes
self.fom(<metric>), i.e. the default case corresponds to
self.fom = lambda x: np.sum(x, axis=1), which you could have also set by calling
fct – Function that takes the metric as first argument
*args – Positional arguments that are added to the positional arguments of
fctafter the metric
**kwargs – Keyword arguments for the function
- class clusterking.benchmark.BenchmarkResult(data, bpoints, md)¶