hyperimpute.plugins.imputers.plugin_gain module

class GainImputation(batch_size: int = 256, n_epochs: int = 1000, hint_rate: float = 0.9, loss_alpha: float = 10)

Bases: TransformerMixin

GAIN Imputation for static data using Generative Adversarial Nets. The training steps are:

The generato imputes the missing components conditioned on what is actually observed, and outputs a completed vector.

The discriminator takes a completed vector and attempts to determine which components were actually observed and which were imputed.

Parameters:

batch_size – int The batch size for the training steps.
n_epochs – int Number of epochs for training.
hint_rate – float Percentage of additional information for the discriminator.
loss_alpha – int Hyperparameter for the generator loss.

Paper: J. Yoon, J. Jordon, M. van der Schaar, “GAIN: Missing Data Imputation using Generative Adversarial Nets,” ICML, 2018. Original code: https://github.com/jsyoon0823/GAIN

fit(X: Tensor) → GainImputation

Train the GAIN model.

Parameters:: X – incomplete dataset.
Returns:: the updated model.
Return type:: self

fit_transform(X: Tensor) → Tensor

Imputes the provided dataset using the GAIN strategy.

Parameters:: X – np.ndarray A dataset with missing values.
Returns:: The imputed dataset.
Return type:: Xhat

transform(Xmiss: Tensor) → Tensor

Return imputed data by trained GAIN model.

Parameters:: Xmiss – the array with missing data
Returns:: the array without missing data
Return type:: torch.Tensor
Raises:: RuntimeError – if the result contains np.nans.

class GainModel(dim: int, h_dim: int, loss_alpha: float = 10)

Bases: object

The core model for GAIN Imputation.

Parameters:

dim – float Number of features.
h_dim – float Size of the hidden layer.
loss_alpha – int Hyperparameter for the generator loss.

discr_loss(X: Tensor, M: Tensor, H: Tensor) → Tensor

discriminator(X: Tensor, hints: Tensor) → Tensor

gen_loss(X: Tensor, M: Tensor, H: Tensor) → Tensor

generator(X: Tensor, mask: Tensor) → Tensor

class GainPlugin(batch_size: int = 128, n_epochs: int = 100, hint_rate: float = 0.8, loss_alpha: int = 10, random_state: int = 0)

Bases: ImputerPlugin

Imputation plugin for completing missing values using the GAIN strategy.

Method:: Details in the GainImputation class implementation.

Example

>>> import numpy as np
>>> from hyperimpute.plugins.imputers import Imputers
>>> plugin = Imputers().get("gain")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])

_abc_impl = <_abc_data object>

_fit(**kwargs: Any) → Any

_transform(**kwargs: Any) → Any

static hyperparameter_space(*args: Any, **kwargs: Any) → List[Params]

module_relative_path: Optional[Path]

static name() → str

plugin: alias of GainPlugin

sample_M(m: int, n: int, p: float) → ndarray

Hint Vector Generation

Parameters:

m – number of rows
n – number of columns
p – hint rate

Returns:

generated random values

Return type:

np.ndarray

sample_Z(m: int, n: int) → ndarray

Random sample generator for Z.

Parameters:

m – number of rows
n – number of columns

Returns:

generated random values

Return type:

np.ndarray

sample_idx(m: int, n: int) → ndarray

Mini-batch generation

Parameters:

m – number of rows
n – number of columns

Returns:

generated random indices

Return type:

np.ndarray