K-Means

class sparseklearn.kmeans.KMeans(n_components=8, init='kmpp', tol=0.0001, n_init=10, n_passes=1, max_iter=300, means_init_array=None, **kwargs)[source]

Sparsified K-Means clustering.

Parameters
n_componentsint, default: 8

The number of clusters.

init{ndarray, ‘kmpp’, ‘random’}, default: ‘kmpp’

Initialization method:

ndarray : shape (n_components, P). Initial cluster centers, must be transformed already.

‘kmpp’: picks initial cluster centers from the data with probability proportional to the distance of each datapoint to the current initial means. More expensive but better convergence. These will be drawn from HDX if the sparsifier has access to it, otherwise they come from RHDX.

‘random’: picks iniitial cluster centers uniformly at random from the datapoints.These will be drawn from HDX if the sparsifier has access to it, otherwise they come from RHDX.

n_initint, default: 10

Number of times to run k-means on new initializations. The best results are kept.

max_iterint, default: 300

Maximum number of iterations for each run.

tolfloat, default: 1e-4

Relative tolerance with regards to inertia for convergence.

Attributes
cluster_centers_nd.array, shape (n_components, P)

Coordinates of cluster centers

labels_np.array, shape (N,)

Labels of each point

intertia_float

Sum of squared distances of samples to their cluster center.

Methods

apply_HD(self, X)

Apply the preconditioning transform to X.

apply_mask(self, X, mask)

Apply the mask to X.

fit(self[, X, HDX, RHDX])

Compute k-means clustering and assign labels to datapoints.

fit_sparsifier(self[, X, HDX, RHDX])

Fit the sparsifier to specified data.

invert_HD(self, HDX)

Apply the inverse of HD to HDX.

invert_mask_bool(self)

Compute the mask inverse.

pairwise_distances(self[, Y])

Computes the pairwise distance between each sparsified sample, or between each sparsified sample and each full sample in Y if Y is given.

pairwise_mahalanobis_distances(self, means, …)

Computes the mahalanobis distance between each compressed sample and each full mean (each row of means).

weighted_means(self, W)

Computes weighted full means of sparsified samples.

weighted_means_and_variances(self, W)

Computes weighted full means and variances of sparsified samples.

fit(self, X=None, HDX=None, RHDX=None)[source]

Compute k-means clustering and assign labels to datapoints. At least one of the parameters must be set.

Parameters
Xnd.array, shape (N, P), optional

defaults to None. Dense, raw data.

HDXnd.array, shape (N, P), optional

defaults to None. Dense, transformed data.

RHDXnd.array, shape (N, Q), optional

defaults to None. Subsampled, transformed data.