Benchmark
¶
AbstractBenchmark
¶
-
class
clusterking.benchmark.abstract_benchmark.
AbstractBenchmark
(data: clusterking.data.data.Data, cluster_column='cluster')[source]¶ Bases:
object
Subclass this class to implement algorithms to choose benchmark points from all the points (in parameter space) that correspond to one cluster.
-
__init__
(data: clusterking.data.data.Data, cluster_column='cluster')[source]¶ Parameters: - data –
Data
object - cluster_column – Column name of the clusters
- data –
-
cluster_column
¶ The column from which we read the cluster information. Defaults to ‘cluster’.
-
Benchmark
¶
-
class
clusterking.benchmark.benchmark.
Benchmark
(data, cluster_column='cluster')[source]¶ Bases:
clusterking.benchmark.abstract_benchmark.AbstractBenchmark
Selecting benchmarks based on a figure of merit that is calculated with the metric. You have to use
set_metric()
to specify the metric (as for theHierarchyCluster
class). The default case for the figure of merit (“sum”) chooses the point as benchmark point that minimizes the sum of all distances to all other points in the same cluster (where “distance” of course is with respect to the metric).-
__init__
(data, cluster_column='cluster')[source]¶ Parameters: - data –
Data
object - cluster_column – Column name of the clusters
- data –
-
set_metric
(*args, **kwargs) → None[source]¶ Select a metric in one of the following ways:
- If no positional arguments are given, we choose the euclidean metric.
- If the first positional argument is string, we pick one of the metrics
that are defined inscipy.spatical.distance.pdist
by that name (all additional arguments will be past to this function).3. If the first positional argument is a function, we take this function (and add all additional arguments to it).
Examples:
...()
: Euclidean metric...("euclidean")
: Also Euclidean metric...(lambda data: scipy.spatial.distance.pdist(data.data(), 'euclidean')
: Also Euclidean metric...("minkowski", p=2)
: Minkowsky distance withp=2
.
See https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html for more information.
Parameters: - *args –
- **kwargs –
Returns: Function that takes Data object as only parameter and returns a reduced distance matrix.
-
set_fom
(fct: Callable, *args, **kwargs) → None[source]¶ Set a figure of merit. The default case for the figure of merit ( “sum”) chooses the point as benchmark point that minimizes the sum of all distances to all other points in the same cluster (where “distance” of course is with respect to the metric). In general we choose the point that minimizes
self.fom(<metric>)
, i.e. the default case corresponds toself.fom = lambda x: np.sum(x, axis=1)
, which you could have also set by callingself.set_com(np.sum, axis=1)
.Parameters: - fct – Function that takes the metric as first argument
- *args – Positional arguments that are added to the positional
arguments of
fct
after the metric - **kwargs – Keyword arguments for the function
Returns: None
-