Estimation algorithms

class irtorch.estimation_algorithms.BaseIRTAlgorithm

Bases: ABC

Abstract base class for IRT algorithms. All IRT algorithms should inherit from this class.

abstract fit(model: BaseIRTModel, train_data: torch.Tensor, **kwargs)

Fit the model to the data.

Parameters

model (BaseIRTModel) – The model to train. Needs to inherit irtorch.models.BaseIRTModel.
train_data (torch.Tensor) – The training data.
**kwargs – Additional keyword arguments for the algorithm fit method.

class irtorch.estimation_algorithms.ae.AE

Bases: BaseIRTAlgorithm

Autoencoder neural network for fitting IRT models.

fit(model: BaseIRTModel, train_data: Tensor, one_hot_encoded: bool = True, imputation_method: str | None = None, learning_rate: float = 0.02, learning_rate_updates_before_stopping: int = 3, evaluation_interval_size: int = 80, max_epochs: int = 10000, batch_size: int | None = None, batch_normalization_encoder: bool = False, nonlinear_encoder: Module = ELU(alpha=1.0), hidden_layers_encoder: list[int] | None = None, device: str = 'cpu')

Train an IRT model using the autoencoder. If the algorithm fails to converge, try lowering the learning rate. Use batch_size if the data is too large to fit in memory.

Parameters

model (BaseIRTModel) – The model to fit. Needs to inherit irtorch.models.BaseIRTModel.
train_data (torch.Tensor) – The training data. Item responses should be coded 0, 1, … and missing responses coded as nan or -1.
one_hot_encoded (bool, optional) – Whether or not to one-hot encode the train data as encoder input inside this fit method. (default is True)
imputation_method (str, optional) – The method to use for imputing missing data for the encoder. For options see irtorch.utils.impute_missing(). Only methods not relying on a fitted model can be used. Note that missing values are removed from the loss calculation even after imputation. If you do not want this, do the imputation to your dataset before fitting. (default is None and only works for one-hot encoded inputs)
learning_rate (float, optional) – The initial learning rate for the optimizer. (default is 0.02)
learning_rate_updates_before_stopping (int, optional) – The number of times the learning rate can be reduced before stopping training. (default is 3)
evaluation_interval_size (int, optional) – The number of iterations between each model evaluation during training. (default is 80)
max_epochs (int, optional) – The maximum number of epochs to train for. (default is 10000)
batch_size (int, optional) – The batch size for training. (default is None and uses the full dataset)
batch_normalization_encoder (bool, optional) – Whether to use batch normalization for the encoder. (default is False)
nonlinear_encoder (torch.nn.Module, optional) – The non-linear function to use after each hidden layer in the encoder. (default is torch.nn.ELU())
hidden_layers_encoder (list[int], optional) – List of hidden layers for the encoder. Each element is a layer with the number of neurons represented as integers. If not provided, uses one hidden layer with 2 * sum(item_categories) neurons.
device (str, optional) – The device to run the model on. (default is “cuda” if available else “cpu”.)

theta_scores(data: Tensor)

Get the latent scores from an input dataset using the encoder.

Parameters: data (torch.Tensor) – A 2D tensor with test data. Columns are items and rows are respondents.
Returns: A 2D tensor of latent scores. Rows are respondents and latent variables are columns.
Return type: torch.Tensor

class irtorch.estimation_algorithms.jml.JML

Bases: BaseIRTAlgorithm

Joint Maximum Likelihood (JML) for fitting IRT models [1]. JML optimizes the log-likelihood directly without any latent variable integration or distributional assumptions. Instead of rotating between optimizing the latent variables and the model parameters as with the typical JML implementation, all parameters are updated at the same time using the Adam optimizer.

This algorithm is not recommended for large datasets.

fit(model: BaseIRTModel, train_data: Tensor, learning_rate: float = 0.1, learning_rate_update_patience: int = 80, learning_rate_updates_before_stopping: int = 3, max_epochs: int = 10000, batch_size: int = None, start_thetas: Tensor = None, device: str = 'cpu')

Train the model using Joint Maximum Likelihood.

Parameters

model (BaseIRTModel) – The model to fit. Needs to inherit irtorch.models.BaseIRTModel.
train_data (torch.Tensor) – The training data. Item responses should be coded 0, 1, … and missing responses coded as nan or -1.
learning_rate (float, optional) – The initial learning rate for the optimizer. (default is 0.1)
learning_rate_update_patience (int, optional) – The number of epochs to wait before reducing the learning rate. (default is 80)
learning_rate_updates_before_stopping (int, optional) – The number of times the learning rate can be reduced before stopping training. (default is 3)
max_epochs (int, optional) – The maximum number of epochs to train for. (default is 10000)
batch_size (int, optional) – The batch size for training. (default is None and uses the full dataset)
start_thetas (torch.Tensor, optional) – The starting thetas for the training. (default is None and uses the standardized sum scores)
device (str, optional) – The device to run the model on. (default is “cuda” if available else “cpu”.)

class irtorch.estimation_algorithms.mml.MML

Bases: BaseIRTAlgorithm

Marginal Maximum Likelihood (MML) for fitting IRT models [3]. Uses a multivariate normal distribution for the latent variables and Gradient Descent to optimize the model parameters. This method is generally effective for models with a small number of latent variables. More than 3 is not supported. Note that this method typically runs much faster on a GPU.

The marginal log-likelihood is calculated by integrating over an assumed normal distribution for the latent variables with density \(f(\mathbf{\theta})\).

\[\log L(\phi) = \sum_{i=1}^N \log \left( \int P(\mathbf{X}_i = \mathbf{x}_i | \mathbf{\theta}, \phi)f(\mathbf{\theta})d\mathbf{\theta} \right)\]

where

\(N\) are the number of respondents,
\(\mathbf{X}_i\) is the response vector of the \(i\)-th respondent,
\(\mathbf{x}_i\) is the observed response vector of the \(i\)-th respondent,
\(\phi\) are the model parameters,
\(\mathbf{\theta}\) is the latent variable vector,

The integral is approximated using Gauss-Hermite quadratures or a Quasi-Monte Carlo method. \(\log L(\phi)\) is then maximized using stochastic gradient descent. These steps are repeated until convergence.

fit(model: BaseIRTModel, train_data: Tensor, max_epochs: int = 1000, integration_method: str = 'quasi_mc', quadrature_points: int | None = None, covariance_matrix: torch.Tensor | None = None, estimate_covariance: bool = False, learning_rate: float = 0.2, learning_rate_update_patience: int = 7, learning_rate_updates_before_stopping: int = 2, device: str = 'cpu')

Train the model.

Parameters

model (BaseIRTModel) – The model to fit. Needs to inherit irtorch.models.BaseIRTModel.
train_data (torch.Tensor) – The training data. Item responses should be coded 0, 1, … and missing responses coded as nan or -1.
max_epochs (int, optional) – The maximum number of epochs to train for. (default is 1000)
integration_method (str, optional) – The method to use for approximating integrals over the latent variables. Can be either “gauss_hermite” for Gauss-Hermite quadrature or “quasi_mc” for quasi-Monte Carlo. (default is “quasi_mc”).
quadrature_points (int, optional) – The number of quadrature points to use for latent variable integration. Note that large datasets may lead to memory issues if quadratures points are too high. (default is None and uses a function of the number of latent variables)
covariance_matrix (torch.Tensor, optional) – The covariance matrix for the multivariate normal distribution for the latent variables. (default is None and uses uncorrelated variables)
estimate_covariance (bool, optional) – Whether to estimate the latent variable covariance matrix during fitting using an EM-style update. Only supported with integration_method="quasi_mc" and requires more than one latent variable. When enabled, the off-diagonal elements (correlations) are updated each epoch while the diagonal is fixed to 1 for identification. (default is False)
learning_rate (float, optional) – The initial learning rate for the optimizer. (default is 0.20)
learning_rate_update_patience (int, optional) – The number of epochs to wait before reducing the learning rate. (default is 7)
learning_rate_updates_before_stopping (int, optional) – The number of times the learning rate can be reduced before stopping training. (default is 2)
device (str, optional) – The device to run the model on. (default is “cuda” if available else “cpu”.)

class irtorch.estimation_algorithms.vae.VAE

Bases: AE

Variational autoencoder neural network with importance weighted sampling for fitting IRT models [15]. This method is effective when fitting high-dimensional IRT models with large datasets.

fit(model: BaseIRTModel, train_data: Tensor, one_hot_encoded: bool = True, imputation_method: str | None = None, learning_rate: float = 0.002, learning_rate_updates_before_stopping: int = 2, evaluation_interval_size: int = 60, max_epochs: int = 10000, batch_size: int | None = None, batch_normalization_encoder: bool = False, nonlinear_encoder=ELU(alpha=1.0), hidden_layers_encoder: list[int] | None = None, device: str = 'cpu', anneal: bool = True, annealing_iterations: int = 5, iw_samples: int = 5)

Train an IRT model using the variational autoencoder. If the algorithm fails to converge, try lowering the learning rate. Use batch_size if the data is too large to fit in memory.

Parameters

model (BaseIRTModel) – The model to fit. Needs to inherit irtorch.models.BaseIRTModel.
train_data (torch.Tensor) – The training data. Item responses should be coded 0, 1, … and missing responses coded as nan or -1.
one_hot_encoded (bool, optional) – Whether the model uses one-hot encoded data. (default is True)
imputation_method (str, optional) – The method to use for imputing missing data for the encoder. For options see irtorch.utils.impute_missing(). Only methods not relying on a fitted model can be used. Note that missing values are removed from the loss calculation even after imputation. If you do not want this, do the imputation to your dataset before fitting. (default is None and only works for one-hot encoded inputs)
learning_rate (float, optional) – The initial learning rate for the optimizer. (default is 0.002)
learning_rate_updates_before_stopping (int, optional) – The number of times the learning rate can be reduced before stopping training. (default is 2)
evaluation_interval_size (int, optional) – The number of iterations between each model evaluation during training. (default is 60)
max_epochs (int, optional) – The maximum number of epochs to train for. (default is 10000)
batch_size (int, optional) – The batch size for training. (default is None and uses the full dataset)
batch_normalization_encoder (bool, optional) – Whether to use batch normalization for the encoder. (default is False)
nonlinear_encoder (torch.nn.Module, optional) – The non-linear function to use after each hidden layer in the encoder. (default is torch.nn.ELU())
hidden_layers_encoder (list[int], optional) – List of hidden layers for the encoder. Each element is a layer with the number of neurons represented as integers. If not provided, uses one hidden layer with 2 * sum(item_categories) neurons.
device (str, optional) – The device to run the model on. (default is “cuda” if available else “cpu”.)
anneal (bool, optional) – Whether to anneal the KL divergence. (default is True)
annealing_iterations (int, optional) – The number of iterations to anneal the KL divergence. (default is 5)
iw_samples (int, optional) – The number of importance weighted samples to use. (default is 5)

latent_credible_interval(input_data: Tensor, alpha=0.05) → tuple[torch.Tensor, torch.Tensor, torch.Tensor]

Get the credible interval for the latent variables.

Parameters

input_data (torch.Tensor) – The input data.
alpha (float, optional) – The significance level. (default is 0.05)

Returns

The lower bound, mean, and upper bound of the credible interval.

Return type

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

latent_mean_se(input_data: Tensor) → tuple[torch.Tensor, torch.Tensor]

Get the posterior mean and standard error of the latent variables.

Parameters: input_data (torch.Tensor) – The input data.
Returns: The posterior mean and standard error of the latent variables.
Return type: tuple[torch.Tensor, torch.Tensor]

sample_latent_variables(model: BaseIRTModel, sample_size: int, input_data: torch.Tensor | None = None)

Sample latent variables from the encoder.

Parameters

model (BaseIRTModel) – The model.
sample_size (int) – The number of samples to draw.
input_data (torch.Tensor, optional) – The data to sample from. If None, uses the training data. (default is None)

Returns

A 2D tensor of latent variables. Rows are samples and columns are latent variables.

Return type

torch.Tensor

theta_scores(data: Tensor)

Get the latent scores from an input dataset using the encoder.

Parameters: data (torch.Tensor) – A 2D tensor with test data. Columns are items and rows are respondents.
Returns: A 2D tensor of latent scores. Rows are respondents and columns are latent variables.
Return type: torch.Tensor