hyperimpute.plugins.imputers.plugin_gain module
- class GainImputation(batch_size: int = 256, n_epochs: int = 1000, hint_rate: float = 0.9, loss_alpha: float = 10)
Bases:
TransformerMixin
GAIN Imputation for static data using Generative Adversarial Nets. The training steps are:
The generato imputes the missing components conditioned on what is actually observed, and outputs a completed vector.
The discriminator takes a completed vector and attempts to determine which components were actually observed and which were imputed.
- Parameters:
batch_size – int The batch size for the training steps.
n_epochs – int Number of epochs for training.
hint_rate – float Percentage of additional information for the discriminator.
loss_alpha – int Hyperparameter for the generator loss.
Paper: J. Yoon, J. Jordon, M. van der Schaar, “GAIN: Missing Data Imputation using Generative Adversarial Nets,” ICML, 2018. Original code: https://github.com/jsyoon0823/GAIN
- fit(X: Tensor) GainImputation
Train the GAIN model.
- Parameters:
X – incomplete dataset.
- Returns:
the updated model.
- Return type:
self
- fit_transform(X: Tensor) Tensor
Imputes the provided dataset using the GAIN strategy.
- Parameters:
X – np.ndarray A dataset with missing values.
- Returns:
The imputed dataset.
- Return type:
Xhat
- transform(Xmiss: Tensor) Tensor
Return imputed data by trained GAIN model.
- Parameters:
Xmiss – the array with missing data
- Returns:
the array without missing data
- Return type:
torch.Tensor
- Raises:
RuntimeError – if the result contains np.nans.
- class GainModel(dim: int, h_dim: int, loss_alpha: float = 10)
Bases:
object
The core model for GAIN Imputation.
- Parameters:
dim – float Number of features.
h_dim – float Size of the hidden layer.
loss_alpha – int Hyperparameter for the generator loss.
- discr_loss(X: Tensor, M: Tensor, H: Tensor) Tensor
- discriminator(X: Tensor, hints: Tensor) Tensor
- gen_loss(X: Tensor, M: Tensor, H: Tensor) Tensor
- generator(X: Tensor, mask: Tensor) Tensor
- class GainPlugin(batch_size: int = 128, n_epochs: int = 100, hint_rate: float = 0.8, loss_alpha: int = 10, random_state: int = 0)
Bases:
ImputerPlugin
Imputation plugin for completing missing values using the GAIN strategy.
- Method:
Details in the GainImputation class implementation.
Example
>>> import numpy as np >>> from hyperimpute.plugins.imputers import Imputers >>> plugin = Imputers().get("gain") >>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
- _abc_impl = <_abc_data object>
- _fit(**kwargs: Any) Any
- _transform(**kwargs: Any) Any
- module_relative_path: Optional[Path]
- static name() str
- plugin
alias of
GainPlugin
- sample_M(m: int, n: int, p: float) ndarray
Hint Vector Generation
- Parameters:
m – number of rows
n – number of columns
p – hint rate
- Returns:
generated random values
- Return type:
np.ndarray
- sample_Z(m: int, n: int) ndarray
Random sample generator for Z.
- Parameters:
m – number of rows
n – number of columns
- Returns:
generated random values
- Return type:
np.ndarray
- sample_idx(m: int, n: int) ndarray
Mini-batch generation
- Parameters:
m – number of rows
n – number of columns
- Returns:
generated random indices
- Return type:
np.ndarray