Benchmark
Contents
Benchmark#
This module contains worker classes that select representative sample points for each cluster (“benchmark points”).
AbstractBenchmark
#
- class clusterking.benchmark.AbstractBenchmark[source]#
Bases:
clusterking.worker.DataWorker
Subclass this class to implement algorithms to choose benchmark points from all the points (in parameter space) that correspond to one cluster.
Benchmark
#
- class clusterking.benchmark.Benchmark[source]#
Bases:
clusterking.benchmark.abstract_benchmark.AbstractBenchmark
Selecting benchmarks based on a figure of merit that is calculated with the metric. You have to use
set_metric()
to specify the metric (as for theHierarchyCluster
class). The default case for the figure of merit (“sum”) chooses the point as benchmark point that minimizes the sum of all distances to all other points in the same cluster (where “distance” of course is with respect to the metric).- set_metric(*args, **kwargs) None [source]#
Select a metric in one of the following ways:
If no positional arguments are given, we choose the euclidean metric.
If the first positional argument is string, we pick one of the metrics that are defined in
scipy.spatical.distance.pdist
by that name (all additional arguments will be past to this function).If the first positional argument is a function, we take this function (and add all additional arguments to it).
Examples:
...()
: Euclidean metric...("euclidean")
: Also Euclidean metric...(lambda data: scipy.spatial.distance.pdist(data.data(), 'euclidean')
: Also Euclidean metric...("minkowski", p=2)
: Minkowsky distance withp=2
.
See https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html for more information.
- Parameters
*args – see description above
**kwargs – see description above
- Returns
Function that takes Data object as only parameter and returns a reduced distance matrix.
- set_fom(fct: Callable, *args, **kwargs) None [source]#
Set a figure of merit. The default case for the figure of merit ( “sum”) chooses the point as benchmark point that minimizes the sum of all distances to all other points in the same cluster (where “distance” of course is with respect to the metric). In general we choose the point that minimizes
self.fom(<metric>)
, i.e. the default case corresponds toself.fom = lambda x: np.sum(x, axis=1)
, which you could have also set by callingself.set_com(np.sum, axis=1)
.- Parameters
fct – Function that takes the metric as first argument
*args – Positional arguments that are added to the positional arguments of
fct
after the metric**kwargs – Keyword arguments for the function
- Returns
None
- class clusterking.benchmark.BenchmarkResult(data, bpoints, md)[source]#
Bases:
clusterking.benchmark.abstract_benchmark.AbstractBenchmarkResult