hyperimpute.plugins.imputers.plugin_softimpute module

class SoftImpute(maxit: int = 1000, convergence_threshold: float = 1e-05, max_rank: int = 2, shrink_lambda: float = 0, cv_len: int = 3, random_state: int = 0)

Bases: TransformerMixin

The SoftImpute algorithm fits a low-rank matrix approximation to a matrix with missing values via nuclear-norm regularization. The algorithm can be used to impute quantitative data. To calibrate the the nuclear-norm regularization parameter(shrink_lambda), we perform cross-validation(_cv_softimpute)

Parameters:
  • maxit – int, default=500 maximum number of imputation rounds to perform.

  • convergence_threshold – float, default=1e-5 Minimum ration difference between iterations before stopping.

  • max_rank – int, default=2 Perform a truncated SVD on each iteration with this value as its rank.

  • shrink_lambda – float, default=0 Value by which we shrink singular values on each iteration. If it’s missing, it is calibrated using cross validation.

  • cv_len – int, default=15 the length of the grid on which the cross-validation is performed.

Example

>>> import numpy as np
>>> from hyperimpute.plugins.imputers import Imputers
>>> plugin = Imputers().get("softimpute")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])

Reference: “Spectral Regularization Algorithms for Learning Large Incomplete Matrices”, by Mazumder, Hastie, and Tibshirani.

_approximate_shrink_val(X: ndarray) float

Try to calibrate the shrinkage step using cross-validation. It simulates more missing items and tests the performance of different shrinkage values.

Parameters:

X – np.ndarray The dataset to use.

Returns:

The value to use for the shrinkage step.

Return type:

float

_converged(Xold: ndarray, X: ndarray, mask: ndarray) bool

Checks if the SoftImpute algorithm has converged.

Parameters:
  • Xold – np.ndarray The previous version of the imputed dataset.

  • X – np.ndarray The new version of the imputed dataset.

  • mask – np.ndarray The original missing mask.

Returns:

True/False if the algorithm has converged.

Return type:

bool

_simulate_more_nan(X: ndarray, mask: ndarray) ndarray

Generate more missing values for cross-validation.

Parameters:
  • X – np.ndarray The dataset to use.

  • mask – np.ndarray The existing missing positions

Returns:

A new version of X with more missing values.

Return type:

Xsim

_softimpute(X: ndarray, shrink_val: float) ndarray

Core loop of the algorithm. It approximates the imputed X using the SVD decomposition in a loop, until the algorithm converges/the maxit iteration is reached.

Parameters:
  • X – np.ndarray The previous version of the imputed dataset.

  • shrink_val – float The value by which we shrink singular values on each iteration.

Returns:

The imputed dataset.

Return type:

X_hat

_svd(X: ndarray, shrink_val: float) ndarray

Reconstructs X from low-rank thresholded SVD.

Parameters:
  • X – np.ndarray The previous version of the imputed dataset.

  • shrink_val – float The value by which we shrink singular values on each iteration.

Raises:

RuntimeError – raised if the static checks on the final result fail.

Returns:

new candidate for the result.

Return type:

X_reconstructed

fit(**kwargs: Any) Any
fit_transform(**kwargs: Any) Any

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

classmethod load(buff: bytes) SoftImpute
save() bytes
transform(**kwargs: Any) Any
class SoftImputePlugin(maxit: int = 1000, convergence_threshold: float = 1e-05, max_rank: int = 2, shrink_lambda: float = 0, cv_len: int = 3, random_state: int = 0)

Bases: ImputerPlugin

Imputation plugin for completing missing values using the SoftImpute strategy.

Method:

Details in the SoftImpute class implementation.

Example

>>> import numpy as np
>>> from hyperimpute.plugins.imputers import Imputers
>>> plugin = Imputers().get("softimpute")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
              0             1             2             3
0  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00
1  3.820605e-16  1.708249e-16  1.708249e-16  3.820605e-16
2  1.000000e+00  2.000000e+00  2.000000e+00  1.000000e+00
3  2.000000e+00  2.000000e+00  2.000000e+00  2.000000e+00
_abc_impl = <_abc_data object>
_fit(**kwargs: Any) Any
_transform(**kwargs: Any) Any
static hyperparameter_space(*args: Any, **kwargs: Any) List[Params]
module_relative_path: Optional[Path]
static name() str
plugin

alias of SoftImputePlugin