Introduction

Overview

Sparseklearn is a Python package of machine learning algorithms based on dimensionality reduction via random projections. By working on compressed data, Sparseklearn performs standard machine learning tasks more efficiently and uses less memory. Its algorithms are all one-pass, meaning that they only need to access the raw data once. Sparseklearn implements algorithms described in our papers on sparsified k-means and PCA and on Gaussian mixtures.

Installation

It is highly recommended that you install this package in a virtual environment. With the virtual environment active, build the C extensions and install the package:

python setup.py build_ext --inplace
pip install .

To test the installation, run the unit tests:

pytest

Usage

See examples/ for notebooks of usage examples. You will need Jupyterlab:

cd examples
pip install -r requirements.txt
jupyter lab