hyperimpute.plugins.imputers.plugin_softimpute module
- class SoftImpute(maxit: int = 1000, convergence_threshold: float = 1e-05, max_rank: int = 2, shrink_lambda: float = 0, cv_len: int = 3, random_state: int = 0)
Bases:
TransformerMixin
The SoftImpute algorithm fits a low-rank matrix approximation to a matrix with missing values via nuclear-norm regularization. The algorithm can be used to impute quantitative data. To calibrate the the nuclear-norm regularization parameter(shrink_lambda), we perform cross-validation(_cv_softimpute)
- Parameters:
maxit – int, default=500 maximum number of imputation rounds to perform.
convergence_threshold – float, default=1e-5 Minimum ration difference between iterations before stopping.
max_rank – int, default=2 Perform a truncated SVD on each iteration with this value as its rank.
shrink_lambda – float, default=0 Value by which we shrink singular values on each iteration. If it’s missing, it is calibrated using cross validation.
cv_len – int, default=15 the length of the grid on which the cross-validation is performed.
Example
>>> import numpy as np >>> from hyperimpute.plugins.imputers import Imputers >>> plugin = Imputers().get("softimpute") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
Reference: “Spectral Regularization Algorithms for Learning Large Incomplete Matrices”, by Mazumder, Hastie, and Tibshirani.
- _approximate_shrink_val(X: ndarray) float
Try to calibrate the shrinkage step using cross-validation. It simulates more missing items and tests the performance of different shrinkage values.
- Parameters:
X – np.ndarray The dataset to use.
- Returns:
The value to use for the shrinkage step.
- Return type:
float
- _converged(Xold: ndarray, X: ndarray, mask: ndarray) bool
Checks if the SoftImpute algorithm has converged.
- Parameters:
Xold – np.ndarray The previous version of the imputed dataset.
X – np.ndarray The new version of the imputed dataset.
mask – np.ndarray The original missing mask.
- Returns:
True/False if the algorithm has converged.
- Return type:
bool
- _simulate_more_nan(X: ndarray, mask: ndarray) ndarray
Generate more missing values for cross-validation.
- Parameters:
X – np.ndarray The dataset to use.
mask – np.ndarray The existing missing positions
- Returns:
A new version of X with more missing values.
- Return type:
Xsim
- _softimpute(X: ndarray, shrink_val: float) ndarray
Core loop of the algorithm. It approximates the imputed X using the SVD decomposition in a loop, until the algorithm converges/the maxit iteration is reached.
- Parameters:
X – np.ndarray The previous version of the imputed dataset.
shrink_val – float The value by which we shrink singular values on each iteration.
- Returns:
The imputed dataset.
- Return type:
X_hat
- _svd(X: ndarray, shrink_val: float) ndarray
Reconstructs X from low-rank thresholded SVD.
- Parameters:
X – np.ndarray The previous version of the imputed dataset.
shrink_val – float The value by which we shrink singular values on each iteration.
- Raises:
RuntimeError – raised if the static checks on the final result fail.
- Returns:
new candidate for the result.
- Return type:
X_reconstructed
- fit(**kwargs: Any) Any
- fit_transform(**kwargs: Any) Any
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- classmethod load(buff: bytes) SoftImpute
- save() bytes
- transform(**kwargs: Any) Any
- class SoftImputePlugin(maxit: int = 1000, convergence_threshold: float = 1e-05, max_rank: int = 2, shrink_lambda: float = 0, cv_len: int = 3, random_state: int = 0)
Bases:
ImputerPlugin
Imputation plugin for completing missing values using the SoftImpute strategy.
- Method:
Details in the SoftImpute class implementation.
Example
>>> import numpy as np >>> from hyperimpute.plugins.imputers import Imputers >>> plugin = Imputers().get("softimpute") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]]) 0 1 2 3 0 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1 3.820605e-16 1.708249e-16 1.708249e-16 3.820605e-16 2 1.000000e+00 2.000000e+00 2.000000e+00 1.000000e+00 3 2.000000e+00 2.000000e+00 2.000000e+00 2.000000e+00
- _abc_impl = <_abc_data object>
- _fit(**kwargs: Any) Any
- _transform(**kwargs: Any) Any
- module_relative_path: Optional[Path]
- static name() str
- plugin
alias of
SoftImputePlugin