hyperimpute.plugins.imputers.plugin_softimpute module

class SoftImpute(maxit: int = 1000, convergence_threshold: float = 1e-05, max_rank: int = 2, shrink_lambda: float = 0, cv_len: int = 3, random_state: int = 0)

Bases: TransformerMixin

The SoftImpute algorithm fits a low-rank matrix approximation to a matrix with missing values via nuclear-norm regularization. The algorithm can be used to impute quantitative data. To calibrate the the nuclear-norm regularization parameter(shrink_lambda), we perform cross-validation(_cv_softimpute)

Parameters:

maxit – int, default=500 maximum number of imputation rounds to perform.
convergence_threshold – float, default=1e-5 Minimum ration difference between iterations before stopping.
max_rank – int, default=2 Perform a truncated SVD on each iteration with this value as its rank.
shrink_lambda – float, default=0 Value by which we shrink singular values on each iteration. If it’s missing, it is calibrated using cross validation.
cv_len – int, default=15 the length of the grid on which the cross-validation is performed.

Example

>>> import numpy as np
>>> from hyperimpute.plugins.imputers import Imputers
>>> plugin = Imputers().get("softimpute")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])

Reference: “Spectral Regularization Algorithms for Learning Large Incomplete Matrices”, by Mazumder, Hastie, and Tibshirani.

_approximate_shrink_val(X: ndarray) → float

Try to calibrate the shrinkage step using cross-validation. It simulates more missing items and tests the performance of different shrinkage values.

Parameters:: X – np.ndarray The dataset to use.
Returns:: The value to use for the shrinkage step.
Return type:: float

_converged(Xold: ndarray, X: ndarray, mask: ndarray) → bool

Checks if the SoftImpute algorithm has converged.

Parameters:

Xold – np.ndarray The previous version of the imputed dataset.
X – np.ndarray The new version of the imputed dataset.
mask – np.ndarray The original missing mask.

Returns:

True/False if the algorithm has converged.

Return type:

bool

_simulate_more_nan(X: ndarray, mask: ndarray) → ndarray

Generate more missing values for cross-validation.

Parameters:

X – np.ndarray The dataset to use.
mask – np.ndarray The existing missing positions

Returns:

A new version of X with more missing values.

Return type:

Xsim

_softimpute(X: ndarray, shrink_val: float) → ndarray

Core loop of the algorithm. It approximates the imputed X using the SVD decomposition in a loop, until the algorithm converges/the maxit iteration is reached.

Parameters:

X – np.ndarray The previous version of the imputed dataset.
shrink_val – float The value by which we shrink singular values on each iteration.

Returns:

The imputed dataset.

Return type:

X_hat

_svd(X: ndarray, shrink_val: float) → ndarray

Reconstructs X from low-rank thresholded SVD.

Parameters:

X – np.ndarray The previous version of the imputed dataset.
shrink_val – float The value by which we shrink singular values on each iteration.

Raises:

RuntimeError – raised if the static checks on the final result fail.

Returns:

new candidate for the result.

Return type:

X_reconstructed

fit(**kwargs: Any) → Any

fit_transform(**kwargs: Any) → Any

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

classmethod load(buff: bytes) → SoftImpute

save() → bytes

transform(**kwargs: Any) → Any

class SoftImputePlugin(maxit: int = 1000, convergence_threshold: float = 1e-05, max_rank: int = 2, shrink_lambda: float = 0, cv_len: int = 3, random_state: int = 0)

Bases: ImputerPlugin

Imputation plugin for completing missing values using the SoftImpute strategy.

Method:: Details in the SoftImpute class implementation.

Example

>>> import numpy as np
>>> from hyperimpute.plugins.imputers import Imputers
>>> plugin = Imputers().get("softimpute")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
              0             1             2             3
0  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00
1  3.820605e-16  1.708249e-16  1.708249e-16  3.820605e-16
2  1.000000e+00  2.000000e+00  2.000000e+00  1.000000e+00
3  2.000000e+00  2.000000e+00  2.000000e+00  2.000000e+00

_abc_impl = <_abc_data object>

_fit(**kwargs: Any) → Any

_transform(**kwargs: Any) → Any

static hyperparameter_space(*args: Any, **kwargs: Any) → List[Params]

module_relative_path: Optional[Path]

static name() → str

plugin: alias of SoftImputePlugin