hyperimpute.plugins.imputers.plugin_gain module

class GainImputation(batch_size: int = 256, n_epochs: int = 1000, hint_rate: float = 0.9, loss_alpha: float = 10)

Bases: TransformerMixin

GAIN Imputation for static data using Generative Adversarial Nets. The training steps are:

  • The generato imputes the missing components conditioned on what is actually observed, and outputs a completed vector.

  • The discriminator takes a completed vector and attempts to determine which components were actually observed and which were imputed.

Parameters:
  • batch_size – int The batch size for the training steps.

  • n_epochs – int Number of epochs for training.

  • hint_rate – float Percentage of additional information for the discriminator.

  • loss_alpha – int Hyperparameter for the generator loss.

Paper: J. Yoon, J. Jordon, M. van der Schaar, “GAIN: Missing Data Imputation using Generative Adversarial Nets,” ICML, 2018. Original code: https://github.com/jsyoon0823/GAIN

fit(X: Tensor) GainImputation

Train the GAIN model.

Parameters:

X – incomplete dataset.

Returns:

the updated model.

Return type:

self

fit_transform(X: Tensor) Tensor

Imputes the provided dataset using the GAIN strategy.

Parameters:

X – np.ndarray A dataset with missing values.

Returns:

The imputed dataset.

Return type:

Xhat

transform(Xmiss: Tensor) Tensor

Return imputed data by trained GAIN model.

Parameters:

Xmiss – the array with missing data

Returns:

the array without missing data

Return type:

torch.Tensor

Raises:

RuntimeError – if the result contains np.nans.

class GainModel(dim: int, h_dim: int, loss_alpha: float = 10)

Bases: object

The core model for GAIN Imputation.

Parameters:
  • dim – float Number of features.

  • h_dim – float Size of the hidden layer.

  • loss_alpha – int Hyperparameter for the generator loss.

discr_loss(X: Tensor, M: Tensor, H: Tensor) Tensor
discriminator(X: Tensor, hints: Tensor) Tensor
gen_loss(X: Tensor, M: Tensor, H: Tensor) Tensor
generator(X: Tensor, mask: Tensor) Tensor
class GainPlugin(batch_size: int = 128, n_epochs: int = 100, hint_rate: float = 0.8, loss_alpha: int = 10, random_state: int = 0)

Bases: ImputerPlugin

Imputation plugin for completing missing values using the GAIN strategy.

Method:

Details in the GainImputation class implementation.

Example

>>> import numpy as np
>>> from hyperimpute.plugins.imputers import Imputers
>>> plugin = Imputers().get("gain")
>>> plugin.fit_transform([[1, 1, 1, 1], [np.nan, np.nan, np.nan, np.nan], [1, 2, 2, 1], [2, 2, 2, 2]])
_abc_impl = <_abc_data object>
_fit(**kwargs: Any) Any
_transform(**kwargs: Any) Any
static hyperparameter_space(*args: Any, **kwargs: Any) List[Params]
module_relative_path: Optional[Path]
static name() str
plugin

alias of GainPlugin

sample_M(m: int, n: int, p: float) ndarray

Hint Vector Generation

Parameters:
  • m – number of rows

  • n – number of columns

  • p – hint rate

Returns:

generated random values

Return type:

np.ndarray

sample_Z(m: int, n: int) ndarray

Random sample generator for Z.

Parameters:
  • m – number of rows

  • n – number of columns

Returns:

generated random values

Return type:

np.ndarray

sample_idx(m: int, n: int) ndarray

Mini-batch generation

Parameters:
  • m – number of rows

  • n – number of columns

Returns:

generated random indices

Return type:

np.ndarray