The Sparsifier Object¶
-
class
sparseklearn.sparsifier.
Sparsifier
(num_feat_full, num_feat_comp, num_samp, mask=None, transform='dct', D_indices=None, num_feat_shared=0, random_state=None)[source]¶ Sparsifier.
Compresses data through sparsification. Permits several operations on sparsified data.
- Parameters
- num_feat_fullint
Dimension of a full sample.
- num_feat_compint
The number of dimensions to keep in the compressed data.
- num_sampint
The number of samples in the dataset.
- transform{‘dct’, None}, defaults to ‘dct’.
The preconditioning transform. Determines what form of H to use in the preconditioning transform HD. Any method other than None will also use the diagonal D matrix (which can be set using the D_indices parameter). The direct cosine transform is currently the only method supported (‘dct’).
- masknp.ndarray, shape (n_datapoints, dim_mask), optional
defaults to None. The user-provided mask. If None, mask is generated using the generate_mask method.
- num_feat_sharedint, defaults to 0.
The minimum number of dimensions to be shared across all samples in the compressed data.
- D_indicesnp.ndarray, shape (n_datapoints,), optional
defaults to None. The user-provided diagonal of the preconditioning matrix D. If None, generated using the generate_D_indices method.
- Attributes
- masknp.ndarray, shape (num_samp, num_feat_comp)
The mask used to sparsify the data. Array of integers, each row is the indices specifying which entries that sample were kept.
- D_indicesnp.ndarray, shape (n_signflips,)
Defines the preconditioning matrix D. Array of integers, the indices of the preconditioning matrix D with sign -1.
Methods
apply_HD
(self, X)Apply the preconditioning transform to X.
apply_mask
(self, X, mask)Apply the mask to X.
fit_sparsifier
(self[, X, HDX, RHDX])Fit the sparsifier to specified data.
invert_HD
(self, HDX)Apply the inverse of HD to HDX.
invert_mask_bool
(self)Compute the mask inverse.
pairwise_distances
(self[, Y])Computes the pairwise distance between each sparsified sample, or between each sparsified sample and each full sample in Y if Y is given.
pairwise_mahalanobis_distances
(self, means, …)Computes the mahalanobis distance between each compressed sample and each full mean (each row of means).
weighted_means
(self, W)Computes weighted full means of sparsified samples.
weighted_means_and_variances
(self, W)Computes weighted full means and variances of sparsified samples.
-
apply_mask
(self, X, mask)[source]¶ Apply the mask to X.
- Parameters
- Xnp.ndarray, shape(n, P)
- masknp.ndarray, shape(n, Q)
- Returns
- RXnp.ndarray, shape(n, Q)
Masked X. The nth row of RX is X[n][mask[n]].
-
apply_HD
(self, X)[source]¶ Apply the preconditioning transform to X.
- Parameters
- Xnp.ndarray, shape (n, P)
The data to precondition. Each row is a datapoint.
- Returns
- HDXnp.ndarray, shape (n, P)
The transformed data.
-
invert_HD
(self, HDX)[source]¶ Apply the inverse of HD to HDX.
- Parameters
- HDXnp.ndarray, shape (n, P)
The preconditioned data. Each row is a datapoint.
- Returns
- Xnp.ndarray, shape (n, P)
The raw data.
-
invert_mask_bool
(self)[source]¶ Compute the mask inverse.
The mask is an array indicating which dimensions are kept for each data point. The inverse mask is an array indicating which datapoints keep this dimension, for each dimension. For computational efficiency, the inverse mask is given as a sparse boolean array whereas the mask is a (smaller) dense integer array.
- Returns
- mask_inversesparse.csr_matrix, bool, shape (P,N)
The mask inverse. The ij entry is 1 if the jth datapoint keeps the ith dimension under the mask, and 0 otherwise; in other words, 1 if i is in the list mask[j].
-
fit_sparsifier
(self, X=None, HDX=None, RHDX=None)[source]¶ Fit the sparsifier to specified data.
Sets self.RHDX, the sumsampled, preconditioned data. At least one of the parameters must be set. If RHDX is passed, then X and HDX are ignored. If HDX is passed, then X is ignored.
- Parameters
- Xnp.ndarray, shape (num_samp, num_feat_full), defaults to None.
Dense, raw data.
- HDXnp.ndarray, shape (num_samp, num_feat_full), defaults to None.
Dense, preconditioned data.
- RHDXnp.ndarray, shape (num_samp, num_feat_comp), defaults to None.
Subsampled, preconditioned data.
-
pairwise_distances
(self, Y=None)[source]¶ Computes the pairwise distance between each sparsified sample, or between each sparsified sample and each full sample in Y if Y is given.
- Parameters
- Ynp.ndarray, shape (K, P), optional
defaults to None. Full, transformed samples.
- Returns
- distancesnp.ndarray, shape(K or N, N)
distances between each pair of samples (if Y is None) or distances between each sample and each row in Y.
-
weighted_means
(self, W)[source]¶ Computes weighted full means of sparsified samples. Currently this is also used to compute hard assignments but should be updated for speed later - zeros in W are multiplied through.
- Parameters
- Wnp.ndarray, shape (N, K)
Weights. Each row corresponds to a sample, each column to a set of weights. The columns of W should sum to 1. There is no necessary correspondence between the columns of W.
- Returns
- meansnp.ndarray, shape (K,P)
Weighted full means. Each row corresponds to a possible independent set of weights (for example, a binary W with K columns would give the means of K clusters).
-
weighted_means_and_variances
(self, W)[source]¶ Computes weighted full means and variances of sparsified samples. Currently also used to compute hard assignments but should be updated for speed later - zeros in W are multiplied through.
- Parameters
- Wnp.ndarray, shape (N, K)
Weights. Each row corresponds to a sample, each column to a set of weights. The columns of W should sum to 1. There is no necessary correspondence between the columns of W.
- Returns
- meansnp.ndarray, shape (K,P)
Weighted full means. Each row corresponds to a possible independent set of weights (for example, a binary W with K columns would give the means of K clusters).
- variancesnp.ndarray, shape (K,P)
Weighted full variances. Each row corresponds to a possible independent set of weights (for example, a binary W with K columns would give the variances of K clusters).
-
pairwise_mahalanobis_distances
(self, means, covariances, covariance_type)[source]¶ Computes the mahalanobis distance between each compressed sample and each full mean (each row of means).
- Parameters
- meansnp.ndarray, shape (K,P)
The means with which to take the mahalanobis distances. Each row of means is a single mean in P-dimensional space.
- covariancesnp.ndarray, shape (K,P) or shape (P,).
The non-zero entries of the covariance matrix. If covariance_type is ‘spherical’, must be shape (P,). If covariance_type is ‘diag’, must be shape (K,P)
- covariance_type{‘spherical’, ‘diag’}, string.
The form of the covariance matrix.
- Returns
- distancesnp.ndarray, shape (N,K)
The pairwise mahalanobis distances.