K-Means¶
-
class
sparseklearn.kmeans.
KMeans
(n_components=8, init='kmpp', tol=0.0001, n_init=10, n_passes=1, max_iter=300, means_init_array=None, **kwargs)[source]¶ Sparsified K-Means clustering.
- Parameters
- n_componentsint, default: 8
The number of clusters.
- init{ndarray, ‘kmpp’, ‘random’}, default: ‘kmpp’
Initialization method:
ndarray : shape (n_components, P). Initial cluster centers, must be transformed already.
‘kmpp’: picks initial cluster centers from the data with probability proportional to the distance of each datapoint to the current initial means. More expensive but better convergence. These will be drawn from HDX if the sparsifier has access to it, otherwise they come from RHDX.
‘random’: picks iniitial cluster centers uniformly at random from the datapoints.These will be drawn from HDX if the sparsifier has access to it, otherwise they come from RHDX.
- n_initint, default: 10
Number of times to run k-means on new initializations. The best results are kept.
- max_iterint, default: 300
Maximum number of iterations for each run.
- tolfloat, default: 1e-4
Relative tolerance with regards to inertia for convergence.
- Attributes
- cluster_centers_nd.array, shape (n_components, P)
Coordinates of cluster centers
- labels_np.array, shape (N,)
Labels of each point
- intertia_float
Sum of squared distances of samples to their cluster center.
Methods
apply_HD
(self, X)Apply the preconditioning transform to X.
apply_mask
(self, X, mask)Apply the mask to X.
fit
(self[, X, HDX, RHDX])Compute k-means clustering and assign labels to datapoints.
fit_sparsifier
(self[, X, HDX, RHDX])Fit the sparsifier to specified data.
invert_HD
(self, HDX)Apply the inverse of HD to HDX.
invert_mask_bool
(self)Compute the mask inverse.
pairwise_distances
(self[, Y])Computes the pairwise distance between each sparsified sample, or between each sparsified sample and each full sample in Y if Y is given.
pairwise_mahalanobis_distances
(self, means, …)Computes the mahalanobis distance between each compressed sample and each full mean (each row of means).
weighted_means
(self, W)Computes weighted full means of sparsified samples.
weighted_means_and_variances
(self, W)Computes weighted full means and variances of sparsified samples.
-
fit
(self, X=None, HDX=None, RHDX=None)[source]¶ Compute k-means clustering and assign labels to datapoints. At least one of the parameters must be set.
- Parameters
- Xnd.array, shape (N, P), optional
defaults to None. Dense, raw data.
- HDXnd.array, shape (N, P), optional
defaults to None. Dense, transformed data.
- RHDXnd.array, shape (N, Q), optional
defaults to None. Subsampled, transformed data.