Gaussian Mixture Models

class sparseklearn.gmm.GaussianMixture(n_components=3, covariance_type='spherical', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmpp', means_init=None, covariances_init=None, weights_init=None, predict_training_data=False, **kwargs)[source]

Sparsified Gaussian mixture model.

Fit a Gaussian mixture model to sparsified data. Diagonal and spherical covariances are supported.

Parameters
n_componentsint, defaults to 3.

The number of components (clusters) to fit.

covariance_type{‘spherical’, ‘diag’}, defaults to ‘spherical’.

The form of the covariance matrix.

tolfloat, defaults to 1e-3.

The convergence threshold. EM iterations will stop when the lower bound average gain is below this threshold.

reg_covarfloat, defaults to 1e-6.

Non-negative regularization added to the diagonal of the covariance to ensure it’s positive.

max_iterint, defaults to 100.

The number of EM iterations to perform.

n_initint, defaults to 1.

The number of initializations to perform. The best results are kept.

init_params{‘kmpp’, ‘random’}, defaults to ‘kmpp’.

The method used to initialize the weights, the means and the precisions. If ‘kmpp’, the initial means are chosen using the k-means++ algorithm. If ‘random’, initial means are chosen at random from the input data.

means_initnd.array, shape (n_components, P), optional.

The user-provided initial means, defaults to None, in which case the means are initialized using the init_params method. P is the number of features in the full-dimensional space.

predict_training_databool, default to False.

Whether to predict labels for the training data.

Attributes
weights_nd.array, shape (n_components,)

The weight of each mixture component.

means_nd.array, shape (n_components, P)

The mean of each mixture component.

covariances_nd.array

The covariance of each mixture component. The shape depends on covariance_type: (n_components,) if spherical and (n_components, P) if diag.

converged_bool

True if fit() converged, False otherwise.

Methods

apply_HD(self, X)

Apply the preconditioning transform to X.

apply_mask(self, X, mask)

Apply the mask to X.

fit(self[, X, HDX, RHDX, y])

Estimate model parameters using EM algorithm.

fit_sparsifier(self[, X, HDX, RHDX])

Fit the sparsifier to specified data.

invert_HD(self, HDX)

Apply the inverse of HD to HDX.

invert_mask_bool(self)

Compute the mask inverse.

pairwise_distances(self[, Y])

Computes the pairwise distance between each sparsified sample, or between each sparsified sample and each full sample in Y if Y is given.

pairwise_mahalanobis_distances(self, means, …)

Computes the mahalanobis distance between each compressed sample and each full mean (each row of means).

predict(self, X)

Predict the labels for the data samples in X using the trained model.

weighted_means(self, W)

Computes weighted full means of sparsified samples.

weighted_means_and_variances(self, W)

Computes weighted full means and variances of sparsified samples.

fit(self, X=None, HDX=None, RHDX=None, y=None)[source]

Estimate model parameters using EM algorithm.

Fits the model n_init times and keeps the parameters with which the model has the largest likelihood. Each trial performs at most max_iter iterations of EM until covergence. or a ConvergenceWarning is raised.

At least one of X, HDX, or RHDX must be passed.

Parameters
Xnd.array, shape (N, P), optional

defaults to None. Dense, raw data.

HDXnd.array, shape (N, P), optional

defaults to None. Dense, transformed data.

RHDXnd.array, shape (N, Q), optional

defaults to None. Subsampled, transformed data.

ynd.array, shape (N,), optional

True labels.

predict(self, X)[source]

Predict the labels for the data samples in X using the trained model.

Parameters
Xnd.array, shape (n_samples, Q)

Array of Q-dimensional data points. Each row corresponds to a single data point. X is assumed to be preconditioned and subsampled.

Returns
labelsarray, shape (n_samples,)

Component labels.