Clusterer¶
- class CLUEstering.clusterer(dc: float, rhoc: float, dm: [<class 'float'>, None] = None, seed_dc: [<class 'float'>, None] = None, ppbin: int = 128)[source]¶
Bases:
objectWrapper class for performing clustering using the CLUE algorithm.
- Parameters:¶
- dc : float
Spatial parameter controlling the region for local density calculation.
- rhoc : float
Density threshold separating seeds from outliers.
- dm : float
Spatial parameter controlling the region for follower search.
- ppbin : int
Average number of points per tile.
- kernel : clue_kernels.Algo.kernel
Kernel used to calculate local density.
- clust_data : ClusteringDataSoA
Container for input data.
- clust_prop : cluster_properties
Container for clustering results.
- elapsed_time : float
Execution time of the algorithm in nanoseconds.
- set_params(dc: float, rhoc: float, dm: [<class 'float'>, None] = None, seed_dc: [<class 'float'>, None] = None, ppbin: int = 128) None[source]¶
Set parameters for the clustering algorithm.
- Parameters:¶
- dc : float
Spatial parameter for density calculation.
- rhoc : float
Density threshold.
- dm : float or None
Follower search region. Defaults to dc if None.
- seed_dc : float or None
Seed search region. Defaults to dc if None.
- ppbin : int
Average points per tile.
-
read_data(input_data: DataFrame | str | dict | list | ndarray, wrapped_coords: list | ndarray | None =
None) None[source]¶ Read input data and initialize clustering-related attributes.
- Parameters:¶
- input_data : Union[pd.DataFrame, str, dict, list, np.ndarray]¶
Data to read. Can be one of: - pandas DataFrame: must contain one column per coordinate plus one column for weight. - string: path to a CSV file containing the data. - dict: dictionary with coordinates and weights. - list or ndarray: list of coordinate lists plus a weight list.
- wrapped_coordinates : list or np.ndarray
List or array indicating which dimensions are periodic.
- Raises:¶
ValueError – If the data format is not supported.
- Returns:¶
None
- set_wrapped(wrapped_coords: list | ndarray) None[source]¶
Set which coordinates are periodic (wrapped).
- choose_kernel(choice: str, parameters: list | None = None, function: ~types.LambdaType = <function clusterer.<lambda>>) None[source]¶
Set the kernel for local density calculation.
The default kernel is a flat kernel with parameter 0.5.
- Parameters:¶
- choice : str
Kernel type to use. Options are: ‘flat’, ‘exp’, ‘gaus’, or ‘custom’.
- parameters : list or None
Parameters for the kernel. Required for ‘flat’, ‘exp’, ‘gaus’. Not required for ‘custom’.
- function : function, optional
Function to use for a custom kernel.
- Raises:¶
ValueError – If the number of parameters is invalid or the kernel choice is invalid.
- Returns:¶
None
- property coords : ndarray¶
Return the coordinates of the points used for clustering.
- property weight : ndarray¶
Return the weights of the points.
- property n_dim : int¶
Return the number of dimensions.
- property n_points : int¶
Return the number of points in the dataset.
-
run_clue(backend: str =
'cpu serial', block_size: int =1024, device_id: int =0, verbose: bool =False, dimensions: list | None =None) None[source]¶ Execute the CLUE clustering algorithm.
- Parameters:¶
- backend : str, optional¶
Backend to use for execution. Defaults to ‘cpu serial’.
- block_size : int, optional¶
Size of blocks for parallel execution. Defaults to 1024.
- device_id : int, optional¶
Device ID to run the algorithm on. Defaults to 0.
- verbose : bool, optional¶
If True, prints execution time and number of clusters found.
- dimensions : list[int] or None, optional¶
Optional list of dimensions to consider. Defaults to None.
- Returns:¶
None
-
fit(data: DataFrame | str | dict | list | ndarray, backend: str =
'cpu serial', block_size: int =1024, device_id: int =0, verbose: bool =False, dimensions: list | None =None) Clusterer[source]¶ Run the CLUE clustering algorithm on the input data.
- Parameters:¶
- data : Union[pd.DataFrame, str, dict, list, np.ndarray]¶
Input data. Can be a pandas DataFrame, a CSV file path (string), a dictionary with coordinate keys and weight, or a list/array containing coordinates and weights.
- backend : str, optional¶
Backend to use for the algorithm execution.
- block_size : int, optional¶
Block size for parallel execution.
- device_id : int, optional¶
ID of the device to run the algorithm on.
- verbose : bool, optional¶
If True, prints execution information.
- dimensions : list or None, optional¶
List of dimensions to consider. If None, all are used.
- Returns:¶
Returns the clusterer object itself.
- Return type:¶
Clusterer
- Raises:¶
Various exceptions if input data is invalid or clustering fails.
-
fit_predict(data: [], backend: str =
'cpu serial', block_size: int =1024, device_id: int =0, verbose: bool =False, dimensions: list | None =None) ndarray[source]¶ Run the CLUE clustering algorithm and return the cluster labels.
- Parameters:¶
- data : Union[pd.DataFrame, str, dict, list, np.ndarray]¶
Input data. Can be a pandas DataFrame, a CSV file path (string), a dictionary with coordinate keys and weight, or a list/array containing coordinates and weights.
- backend : str, optional¶
Backend to use for the algorithm execution.
- block_size : int, optional¶
Block size for parallel execution.
- device_id : int, optional¶
ID of the device to run the algorithm on.
- verbose : bool, optional¶
If True, prints execution information.
- dimensions : list or None, optional¶
List of dimensions to consider. If None, all are used.
- Returns:¶
Array containing the cluster index for every point.
- Return type:¶
np.ndarray
- Raises:¶
Various exceptions if input data is invalid or clustering fails.
- property n_clusters : int¶
Return the number of clusters found.
- property cluster_ids : ndarray¶
Index of the cluster to which each point belongs.
- property labels : ndarray¶
Alias for cluster_ids.
- property cluster_points : ndarray¶
List of points for each cluster.
- property points_per_cluster : ndarray¶
Number of points in each cluster.
- property output_df : DataFrame¶
DataFrame containing cluster_ids.
- cluster_centroid(cluster_index: int) ndarray[source]¶
Computes the centroid coordinates of a specified cluster.
-
input_plotter(filepath: str | None =
None, plot_title: str ='', title_size: float =16, x_label: str ='x', y_label: str ='y', z_label: str ='z', label_size: float =16, pt_size: float =1, pt_colour: str ='b', grid: bool =True, grid_style: str ='--', grid_size: float =0.2, x_ticks=None, y_ticks=None, z_ticks=None, **kwargs) None[source]¶ Plots the input points in 1D, 2D, or 3D space.
- Parameters:¶
- filepath : str or None¶
Path to save the plot. If None, the plot is shown interactively.
- plot_title : str¶
Title of the plot.
- title_size : float¶
Font size of the plot title.
- x_label : str¶
Label for the x-axis.
- y_label : str¶
Label for the y-axis.
- z_label : str¶
Label for the z-axis.
- label_size : float¶
Font size for axis labels.
- pt_size : float¶
Size of the points.
- pt_colour : str¶
Colour of the points.
- grid : bool¶
Whether to display a grid.
- grid_style : str¶
Line style of the grid.
- grid_size : float¶
Line width of the grid.
- x_ticks : list or None¶
Custom tick locations for x-axis.
- y_ticks : list or None¶
Custom tick locations for y-axis.
- z_ticks : list or None¶
Custom tick locations for z-axis (only for 3D plots).
- kwargs : dict¶
Optional functions for converting coordinates.
- Returns:¶
None
- Return type:¶
None
-
cluster_plotter(filepath: str | None =
None, plot_title: str ='', title_size: float =16, x_label: str ='x', y_label: str ='y', z_label: str ='z', label_size: float =16, outl_size: float =10, pt_size: float =10, grid: bool =True, grid_style: str ='--', grid_size: float =0.2, x_ticks=None, y_ticks=None, z_ticks=None, **kwargs) None[source]¶ Plots clusters with different colors and outliers as gray crosses.
- Parameters:¶
- filepath : str or None¶
Path to save the plot. If None, the plot is shown interactively.
- plot_title : str¶
Title of the plot.
- title_size : float¶
Font size of the plot title.
- x_label : str¶
Label for the x-axis.
- y_label : str¶
Label for the y-axis.
- z_label : str¶
Label for the z-axis.
- label_size : float¶
Font size for axis labels.
- outl_size : float¶
Marker size for outliers.
- pt_size : float¶
Marker size for cluster points.
- grid : bool¶
Whether to display a grid.
- grid_style : str¶
Line style of the grid.
- grid_size : float¶
Line width of the grid.
- x_ticks : list or None¶
Custom tick locations for x-axis.
- y_ticks : list or None¶
Custom tick locations for y-axis.
- z_ticks : list or None¶
Custom tick locations for z-axis (only for 3D plots).
- kwargs : dict¶
Optional functions for converting coordinates.
- Returns:¶
None
- Return type:¶
None
- to_csv(output_folder: str, file_name: str) None[source]¶
Creates a file containing the coordinates of all the points and their cluster_ids.
- import_clusterer(input_folder: str, file_name: str) None[source]¶
Imports the results of a previous clustering.