The Sparsifier Object¶

class sparseklearn.sparsifier.Sparsifier(num_feat_full, num_feat_comp, num_samp, mask=None, transform='dct', D_indices=None, num_feat_shared=0, random_state=None)[source]¶

Sparsifier.

Compresses data through sparsification. Permits several operations on sparsified data.

Parameters

num_feat_fullint: Dimension of a full sample.
num_feat_compint: The number of dimensions to keep in the compressed data.
num_sampint: The number of samples in the dataset.
transform{‘dct’, None}, defaults to ‘dct’.: The preconditioning transform. Determines what form of H to use in the preconditioning transform HD. Any method other than None will also use the diagonal D matrix (which can be set using the D_indices parameter). The direct cosine transform is currently the only method supported (‘dct’).
masknp.ndarray, shape (n_datapoints, dim_mask), optional: defaults to None. The user-provided mask. If None, mask is generated using the generate_mask method.
num_feat_sharedint, defaults to 0.: The minimum number of dimensions to be shared across all samples in the compressed data.
D_indicesnp.ndarray, shape (n_datapoints,), optional: defaults to None. The user-provided diagonal of the preconditioning matrix D. If None, generated using the generate_D_indices method.

Attributes

masknp.ndarray, shape (num_samp, num_feat_comp): The mask used to sparsify the data. Array of integers, each row is the indices specifying which entries that sample were kept.
D_indicesnp.ndarray, shape (n_signflips,): Defines the preconditioning matrix D. Array of integers, the indices of the preconditioning matrix D with sign -1.

Methods

`apply_HD`(self, X)	Apply the preconditioning transform to X.
`apply_mask`(self, X, mask)	Apply the mask to X.
`fit_sparsifier`(self[, X, HDX, RHDX])	Fit the sparsifier to specified data.
`invert_HD`(self, HDX)	Apply the inverse of HD to HDX.
`invert_mask_bool`(self)	Compute the mask inverse.
`pairwise_distances`(self[, Y])	Computes the pairwise distance between each sparsified sample, or between each sparsified sample and each full sample in Y if Y is given.
`pairwise_mahalanobis_distances`(self, means, …)	Computes the mahalanobis distance between each compressed sample and each full mean (each row of means).
`weighted_means`(self, W)	Computes weighted full means of sparsified samples.
`weighted_means_and_variances`(self, W)	Computes weighted full means and variances of sparsified samples.

apply_mask(self, X, mask)[source]¶

Apply the mask to X.

Parameters

Xnp.ndarray, shape(n, P)
masknp.ndarray, shape(n, Q)

Returns

RXnp.ndarray, shape(n, Q): Masked X. The nth row of RX is X[n][mask[n]].

apply_HD(self, X)[source]¶

Apply the preconditioning transform to X.

Parameters

Xnp.ndarray, shape (n, P): The data to precondition. Each row is a datapoint.

Returns

HDXnp.ndarray, shape (n, P): The transformed data.

invert_HD(self, HDX)[source]¶

Apply the inverse of HD to HDX.

Parameters

HDXnp.ndarray, shape (n, P): The preconditioned data. Each row is a datapoint.

Returns

Xnp.ndarray, shape (n, P): The raw data.

invert_mask_bool(self)[source]¶

Compute the mask inverse.

The mask is an array indicating which dimensions are kept for each data point. The inverse mask is an array indicating which datapoints keep this dimension, for each dimension. For computational efficiency, the inverse mask is given as a sparse boolean array whereas the mask is a (smaller) dense integer array.

Returns

mask_inversesparse.csr_matrix, bool, shape (P,N): The mask inverse. The ij entry is 1 if the jth datapoint keeps the ith dimension under the mask, and 0 otherwise; in other words, 1 if i is in the list mask[j].

fit_sparsifier(self, X=None, HDX=None, RHDX=None)[source]¶

Fit the sparsifier to specified data.

Sets self.RHDX, the sumsampled, preconditioned data. At least one of the parameters must be set. If RHDX is passed, then X and HDX are ignored. If HDX is passed, then X is ignored.

Parameters

Xnp.ndarray, shape (num_samp, num_feat_full), defaults to None.: Dense, raw data.
HDXnp.ndarray, shape (num_samp, num_feat_full), defaults to None.: Dense, preconditioned data.
RHDXnp.ndarray, shape (num_samp, num_feat_comp), defaults to None.: Subsampled, preconditioned data.

pairwise_distances(self, Y=None)[source]¶

Computes the pairwise distance between each sparsified sample, or between each sparsified sample and each full sample in Y if Y is given.

Parameters

Ynp.ndarray, shape (K, P), optional: defaults to None. Full, transformed samples.

Returns

distancesnp.ndarray, shape(K or N, N): distances between each pair of samples (if Y is None) or distances between each sample and each row in Y.

weighted_means(self, W)[source]¶

Computes weighted full means of sparsified samples. Currently this is also used to compute hard assignments but should be updated for speed later - zeros in W are multiplied through.

Parameters

Wnp.ndarray, shape (N, K): Weights. Each row corresponds to a sample, each column to a set of weights. The columns of W should sum to 1. There is no necessary correspondence between the columns of W.

Returns

meansnp.ndarray, shape (K,P): Weighted full means. Each row corresponds to a possible independent set of weights (for example, a binary W with K columns would give the means of K clusters).

weighted_means_and_variances(self, W)[source]¶

Computes weighted full means and variances of sparsified samples. Currently also used to compute hard assignments but should be updated for speed later - zeros in W are multiplied through.

Parameters

Wnp.ndarray, shape (N, K): Weights. Each row corresponds to a sample, each column to a set of weights. The columns of W should sum to 1. There is no necessary correspondence between the columns of W.

Returns

meansnp.ndarray, shape (K,P): Weighted full means. Each row corresponds to a possible independent set of weights (for example, a binary W with K columns would give the means of K clusters).
variancesnp.ndarray, shape (K,P): Weighted full variances. Each row corresponds to a possible independent set of weights (for example, a binary W with K columns would give the variances of K clusters).

pairwise_mahalanobis_distances(self, means, covariances, covariance_type)[source]¶

Computes the mahalanobis distance between each compressed sample and each full mean (each row of means).

Parameters

meansnp.ndarray, shape (K,P): The means with which to take the mahalanobis distances. Each row of means is a single mean in P-dimensional space.
covariancesnp.ndarray, shape (K,P) or shape (P,).: The non-zero entries of the covariance matrix. If covariance_type is ‘spherical’, must be shape (P,). If covariance_type is ‘diag’, must be shape (K,P)
covariance_type{‘spherical’, ‘diag’}, string.: The form of the covariance matrix.

Returns

distancesnp.ndarray, shape (N,K): The pairwise mahalanobis distances.

The Sparsifier Object¶

sparseklearn

Navigation

Related Topics