Clusterer¶

class CLUEstering.clusterer(dc: float, rhoc: float, dm: [<class 'float'>, None] = None, seed_dc: [<class 'float'>, None] = None, ppbin: int = 128)[source]¶

Bases: object

Wrapper class for performing clustering using the CLUE algorithm.

Parameters:¶

dc : float: Spatial parameter controlling the region for local density calculation.
rhoc : float: Density threshold separating seeds from outliers.
dm : float: Spatial parameter controlling the region for follower search.
ppbin : int: Average number of points per tile.
kernel : clue_kernels.Algo.kernel: Kernel used to calculate local density.
clust_data : ClusteringDataSoA: Container for input data.
clust_prop : cluster_properties: Container for clustering results.
elapsed_time : float: Execution time of the algorithm in nanoseconds.

set_params(dc: float, rhoc: float, dm: [<class 'float'>, None] = None, seed_dc: [<class 'float'>, None] = None, ppbin: int = 128) → None[source]¶

Set parameters for the clustering algorithm.

Parameters:¶

dc : float: Spatial parameter for density calculation.
rhoc : float: Density threshold.
dm : float or None: Follower search region. Defaults to dc if None.
seed_dc : float or None: Seed search region. Defaults to dc if None.
ppbin : int: Average points per tile.

Read input data and initialize clustering-related attributes.

Parameters:¶

input_data : Union[pd.DataFrame, str, dict, list, np.ndarray]¶: Data to read. Can be one of: - pandas DataFrame: must contain one column per coordinate plus one column for weight. - string: path to a CSV file containing the data. - dict: dictionary with coordinates and weights. - list or ndarray: list of coordinate lists plus a weight list.
wrapped_coordinates : list or np.ndarray: List or array indicating which dimensions are periodic.

Raises:¶

ValueError – If the data format is not supported.

Returns:¶

None

set_wrapped(wrapped_coords: list | ndarray) → None[source]¶

Set which coordinates are periodic (wrapped).

Parameters:¶

wrapped_coordinates : list or np.ndarray: List or array indicating which dimensions are periodic.

Returns:¶

None

choose_kernel(choice: str, parameters: list | None = None, function: ~types.LambdaType = <function clusterer.<lambda>>) → None[source]¶

Set the kernel for local density calculation.

The default kernel is a flat kernel with parameter 0.5.

Parameters:¶

choice : str: Kernel type to use. Options are: ‘flat’, ‘exp’, ‘gaus’, or ‘custom’.
parameters : list or None: Parameters for the kernel. Required for ‘flat’, ‘exp’, ‘gaus’. Not required for ‘custom’.
function : function, optional: Function to use for a custom kernel.

Raises:¶

ValueError – If the number of parameters is invalid or the kernel choice is invalid.

Returns:¶

None

property coords : ndarray¶

Return the coordinates of the points used for clustering.

Returns:¶: Coordinates array.
Return type:¶: np.ndarray

property weight : ndarray¶

Return the weights of the points.

Returns:¶: Weights array.
Return type:¶: np.ndarray

property n_dim : int¶

Return the number of dimensions.

Returns:¶: Number of dimensions.
Return type:¶: int

property n_points : int¶

Return the number of points in the dataset.

Returns:¶: Number of points.
Return type:¶: int

list_devices(backend: str = 'all') → None[source]¶

List available devices for a given backend.

Parameters:¶

backend : str, optional¶: Backend to list devices for. Options: ‘all’, ‘cpu serial’, ‘cpu tbb’, ‘cpu openmp’, ‘gpu cuda’, ‘gpu hip’. Defaults to ‘all’.

Raises:¶

ValueError – If the backend is not valid.

Returns:¶

None

run_clue(backend: str = 'cpu serial', block_size: int = 1024, device_id: int = 0, verbose: bool = False, dimensions: list | None = None) → None[source]¶

Execute the CLUE clustering algorithm.

Parameters:¶

backend : str, optional¶: Backend to use for execution. Defaults to ‘cpu serial’.
block_size : int, optional¶: Size of blocks for parallel execution. Defaults to 1024.
device_id : int, optional¶: Device ID to run the algorithm on. Defaults to 0.
verbose : bool, optional¶: If True, prints execution time and number of clusters found.
dimensions : list[int] or None, optional¶: Optional list of dimensions to consider. Defaults to None.

Returns:¶

None

Run the CLUE clustering algorithm on the input data.

Parameters:¶

data : Union[pd.DataFrame, str, dict, list, np.ndarray]¶: Input data. Can be a pandas DataFrame, a CSV file path (string), a dictionary with coordinate keys and weight, or a list/array containing coordinates and weights.
backend : str, optional¶: Backend to use for the algorithm execution.
block_size : int, optional¶: Block size for parallel execution.
device_id : int, optional¶: ID of the device to run the algorithm on.
verbose : bool, optional¶: If True, prints execution information.
dimensions : list or None, optional¶: List of dimensions to consider. If None, all are used.

Returns:¶

Returns the clusterer object itself.

Return type:¶

Clusterer

Raises:¶

Various exceptions if input data is invalid or clustering fails.

fit_predict(data: [], backend: str = 'cpu serial', block_size: int = 1024, device_id: int = 0, verbose: bool = False, dimensions: list | None = None) → ndarray[source]¶

Run the CLUE clustering algorithm and return the cluster labels.

Parameters:¶

data : Union[pd.DataFrame, str, dict, list, np.ndarray]¶: Input data. Can be a pandas DataFrame, a CSV file path (string), a dictionary with coordinate keys and weight, or a list/array containing coordinates and weights.
backend : str, optional¶: Backend to use for the algorithm execution.
block_size : int, optional¶: Block size for parallel execution.
device_id : int, optional¶: ID of the device to run the algorithm on.
verbose : bool, optional¶: If True, prints execution information.
dimensions : list or None, optional¶: List of dimensions to consider. If None, all are used.

Returns:¶

Array containing the cluster index for every point.

Return type:¶

np.ndarray

Raises:¶

Various exceptions if input data is invalid or clustering fails.

property n_clusters : int¶

Return the number of clusters found.

Returns:¶: Number of clusters reconstructed by CLUE.
Return type:¶: int

property cluster_ids : ndarray¶

Index of the cluster to which each point belongs.

Returns:¶: Array mapping each point to its cluster.
Return type:¶: np.ndarray

property labels : ndarray¶

Alias for cluster_ids.

Returns:¶: Array mapping each point to its cluster.
Return type:¶: np.ndarray

property cluster_points : ndarray¶

List of points for each cluster.

Returns:¶: Array of arrays containing point indices per cluster.
Return type:¶: np.ndarray

property points_per_cluster : ndarray¶

Number of points in each cluster.

Returns:¶: Array containing the number of points in each cluster.
Return type:¶: np.ndarray

property output_df : DataFrame¶

DataFrame containing cluster_ids.

Returns:¶: Pandas DataFrame with cluster assignments.
Return type:¶: pd.DataFrame

cluster_centroid(cluster_index: int) → ndarray[source]¶

Computes the centroid coordinates of a specified cluster.

Parameters:¶

cluster_id: ID of the cluster.

Returns:¶

Coordinates of the cluster centroid.

Return type:¶

np.ndarray

Raises:¶

ValueError – If the cluster_id is invalid.

cluster_centroids() → ndarray[source]¶

Computes the centroids of all clusters.

Returns:¶: Array of shape (n_clusters-1, n_dim) containing cluster centroids.
Return type:¶: np.ndarray

input_plotter(filepath: str | None = None, plot_title: str = '', title_size: float = 16, x_label: str = 'x', y_label: str = 'y', z_label: str = 'z', label_size: float = 16, pt_size: float = 1, pt_colour: str = 'b', grid: bool = True, grid_style: str = '--', grid_size: float = 0.2, x_ticks=None, y_ticks=None, z_ticks=None, **kwargs) → None[source]¶

Plots the input points in 1D, 2D, or 3D space.

Parameters:¶

filepath : str or None¶: Path to save the plot. If None, the plot is shown interactively.
plot_title : str¶: Title of the plot.
title_size : float¶: Font size of the plot title.
x_label : str¶: Label for the x-axis.
y_label : str¶: Label for the y-axis.
z_label : str¶: Label for the z-axis.
label_size : float¶: Font size for axis labels.
pt_size : float¶: Size of the points.
pt_colour : str¶: Colour of the points.
grid : bool¶: Whether to display a grid.
grid_style : str¶: Line style of the grid.
grid_size : float¶: Line width of the grid.
x_ticks : list or None¶: Custom tick locations for x-axis.
y_ticks : list or None¶: Custom tick locations for y-axis.
z_ticks : list or None¶: Custom tick locations for z-axis (only for 3D plots).
kwargs : dict¶: Optional functions for converting coordinates.

Returns:¶

None

Return type:¶

None

cluster_plotter(filepath: str | None = None, plot_title: str = '', title_size: float = 16, x_label: str = 'x', y_label: str = 'y', z_label: str = 'z', label_size: float = 16, outl_size: float = 10, pt_size: float = 10, grid: bool = True, grid_style: str = '--', grid_size: float = 0.2, x_ticks=None, y_ticks=None, z_ticks=None, **kwargs) → None[source]¶

Plots clusters with different colors and outliers as gray crosses.

Parameters:¶

filepath : str or None¶: Path to save the plot. If None, the plot is shown interactively.
plot_title : str¶: Title of the plot.
title_size : float¶: Font size of the plot title.
x_label : str¶: Label for the x-axis.
y_label : str¶: Label for the y-axis.
z_label : str¶: Label for the z-axis.
label_size : float¶: Font size for axis labels.
outl_size : float¶: Marker size for outliers.
pt_size : float¶: Marker size for cluster points.
grid : bool¶: Whether to display a grid.
grid_style : str¶: Line style of the grid.
grid_size : float¶: Line width of the grid.
x_ticks : list or None¶: Custom tick locations for x-axis.
y_ticks : list or None¶: Custom tick locations for y-axis.
z_ticks : list or None¶: Custom tick locations for z-axis (only for 3D plots).
kwargs : dict¶: Optional functions for converting coordinates.

Returns:¶

None

Return type:¶

None

to_csv(output_folder: str, file_name: str) → None[source]¶

Creates a file containing the coordinates of all the points and their cluster_ids.

Parameters:¶

output_folder : str¶: Full path to the desired output folder.
file_name : str¶: Name of the file, with the ‘.csv’ suffix.

Returns:¶

None

Return type:¶

None

import_clusterer(input_folder: str, file_name: str) → None[source]¶

Imports the results of a previous clustering.

Parameters:¶

input_folder : str¶: Full path to the folder containing the CSV file.
file_name : str¶: Name of the file, with the ‘.csv’ suffix.

Raises:¶

ValueError – If the file does not exist or cannot be read correctly.

Returns:¶

None

Return type:¶

None