Rescaling

class irtorch.rescale.Scale(invertible: bool = False)

Bases: ABC

Abstract base class for Item Response Theory model scale transformations. All scale transformations should inherit from this class.

Note that you can make custom transformations by inheriting from this class. A class instance can then be supplied to irtorch.models.BaseIRTModel.add_scale_transformation() to apply the transformation to the latent variables of the model.

abstract inverse(transformed_theta: Tensor) → Tensor

Puts the scores back to the original theta scale.

Parameters: transformed_theta (torch.Tensor) – A 2D tensor containing transformed theta scores. Each column represents one latent variable.
Returns: A 2D tensor containing theta scores on the the original scale.
Return type: torch.Tensor

abstract jacobian(theta: Tensor) → Tensor

Computes the Jacobian matrix of the scale transformations for each row in the input theta scores.

Parameters: theta (torch.Tensor) – A 2D tensor containing latent variable theta scores. Each column represents one latent variable.
Returns: A torch tensor with the Jacobians for each theta score. Dimensions are (theta rows, latent variables, latent variables) where the last two are the jacobians for each row.
Return type: torch.Tensor

abstract transform(theta: Tensor) → Tensor

Transforms the input theta scores into the new scale.

Parameters: theta (torch.Tensor) – A 2D tensor containing transformed theta scores. Each column represents one latent variable.

class irtorch.rescale.Bit(model: BaseIRTModel, population_theta: torch.Tensor | None = None, start_theta: torch.Tensor | None = None, items: list[int] | None = None, grid_points: int = 4000, mc_start_theta_approx: bool = False, **kwargs)

Bases: Scale

Bit scale transformation, as introduced by Wallmark and Wiberg [16].

Parameters

model (BaseIRTModel) – The IRT model to use for bit scale computation.
population_theta (torch.Tensor, optional) – Theta scores from the population. Usually the training data. Used to find good starting values for the grid of theta scores, which are then used for the bit transformation. Recommended to use for models with theta distributions for which values far from 0 are common. (default is None)
start_theta (torch.Tensor, optional) – The starting theta scores for the bit scale computation. If None, the minimum theta scores are used. (default is None)
items (list[int], optional) – The item indices for the items to use to compute the bit scores. (default is None and uses all items)
grid_points (int, optional) – The number of points to use for computing bit score. More steps lead to more accurate results. (default is 4000)
mc_start_theta_approx (bool, optional) – For multiple choice models. Whether to approximate the starting theta scores using simulated random guesses. If True, runs bit_score_starting_theta_mc(). (default is False)
**kwargs – Additional keyword arguments for the starting theta approximation method. See bit_score_starting_theta_mc().

Notes

First, item bit scores for each item \(j\) are computed from \(\mathbf{\theta}\) scores as follows:

\[\begin{equation} \begin{aligned} B_j(\mathbf{\theta})= \int_{t=\mathbf{\theta}^{(0)}}^{\mathbf{\theta}} \left|\frac{dH_j(t)}{dt}\right| dt. \end{aligned} \end{equation}\]

where

\(\mathbf{\theta}^{(0)}\) is the minimum \(\mathbf{\theta}\)
\(H_j(\mathbf{\theta})\) is entropy for item \(j\) as a function of \(\mathbf{\theta}\)

The total bit scores \(B(\mathbf{\theta})\) are then the sum of the item scores:

\[\begin{equation} \begin{aligned} B(\mathbf{\theta}) = \sum_{j=1}^{J} B_j(\mathbf{\theta}). \end{aligned} \end{equation}\]

Examples

>>> import irtorch
>>> from irtorch.models import MonotoneNN
>>> from irtorch.estimation_algorithms import MML
>>> from irtorch.rescale import Bit
>>> data, mc_correct = irtorch.load_dataset.swedish_sat_quantitative()
>>> model = MonotoneNN(data, mc_correct=mc_correct)
>>> model.fit(train_data=data, algorithm=MML())
>>> thetas = model.latent_scores(data)
>>> # Initalize the scale transformation
>>> # mc_start_theta_approx sets the starting theta to the approximate score of a randomly guessing respondent
>>> bit = Bit(model, population_theta=thetas, mc_start_theta_approx=True)
>>> # Supply the new scale to the model
>>> model.add_scale_transformation(bit)
>>> # Estimate thetas on the transformed scale
>>> rescaled_thetas = model.latent_scores(data)
>>> # Or alternatively by directly converting the old ones
>>> rescaled_thetas = model.transform_theta(thetas)
>>> # Plot the differences
>>> model.plot.latent_score_distribution(thetas).show()
>>> model.plot.latent_score_distribution(rescaled_thetas).show()
>>> # Plot an item on the bit transformed scale
>>> model.plot.item_probabilities(1).show()

bit_score_starting_theta_mc(theta_estimation: str = 'ML', ml_map_device: str = 'cpu', lbfgs_learning_rate: float = 0.25, items: list[int] | None = None, guessing_probabilities: list[float] | None = None, guessing_iterations: int = 10000)

For multiple choice models, approximate the starting theta score \(\mathbf{\theta}^{(0)}\) from which to compute bit scores. See notes under bit_scores() for the bit score formula.

Parameters

theta_estimation (str, optional) – Method used to obtain the theta scores. Can be ‘NN’, ‘ML’, ‘EAP’ or ‘MAP’ for neural network, maximum likelihood, expected a posteriori or maximum a posteriori respectively. (default is ‘ML’)
ml_map_device (str, optional) – For ML and MAP. The device to use for computation. Can be ‘cpu’ or ‘cuda’. (default is “cuda” if available else “cpu”)
lbfgs_learning_rate (float, optional) – For ML and MAP. The learning rate to use for the LBFGS optimizer. (default is 0.3)
items (list[int], optional) – The item indices for the items to use to compute the bit scores. (default is None and uses all items)
guessing_probabilities (list[float], optional) – The guessing probability for each item. The same length as the number of items. Guessing is not supported for polytomously scored items and the probabilities for them will be ignored. (default is None and uses no guessing or, for multiple choice models, 1 over the number of item categories)
guessing_iterations (int, optional) – The number of iterations to use for approximating a minimum theta when guessing is incorporated. (default is 200)

Returns

A tensor with all the starting theta values.

Return type

torch.Tensor

inverse(transformed_theta)

Puts the scores back to the original theta scale.

Parameters: transformed_theta (torch.Tensor) – A 2D tensor containing transformed theta scores. Each column represents one latent variable.
Returns: A 2D tensor containing theta scores on the the original scale.
Return type: torch.Tensor

jacobian(theta: Tensor) → Tensor

Computes the gradients of the bit scores with respect to the input theta scores.

Parameters: theta (torch.Tensor) – A 2D tensor containing latent variable theta scores. Each column represents one latent variable.
Returns: A torch tensor with the gradients for each theta score. Dimensions are (theta rows, latent variables, latent variables) where the last two are the jacobians.
Return type: torch.Tensor

set_start_theta(start_theta: Tensor)

Sets the starting theta scores for the bit scale computation.

Parameters: start_theta (torch.Tensor) – The starting theta scores for the bit scale computation.

transform(theta: Tensor) → Tensor

Transforms \(\mathbf{\theta}\) scores into bit scores \(B(\mathbf{\theta})\).

Parameters: theta (torch.Tensor) – A 2D tensor. Columns are latent variables and rows are respondents.
Returns: A 2D tensor with bit score scale scores for each respondent across the rows together with another tensor with start_theta.
Return type: torch.Tensor

transform_to_1D(theta: Tensor) → Tensor

Transforms \(\mathbf{\theta}\) scores of a multi-dimensional model into one-dimensional bit scores \(B(\mathbf{\theta})\). Equivalent to transform() for one-dimensional models.

Parameters: theta (torch.Tensor) – A 2D tensor. Columns are latent variables and rows are respondents.
Returns: A 2D tensor with bit score scale scores for each respondent across the rows together with another tensor with start_theta.
Return type: torch.Tensor

class irtorch.rescale.Flow(latent_variables: int)

Bases: Scale

Normalizing flow transformation of IRT theta scales using rational quadratic splines as per Durkan et al. [7]. Supports gradient computation and the transformation is invertible.

Parameters: latent_variables (int) – The number of latent variables.

Examples

>>> import irtorch
>>> from irtorch.models import GradedResponse
>>> from irtorch.estimation_algorithms import MML
>>> from irtorch.rescale import Flow
>>> data = irtorch.load_dataset.swedish_national_mathematics_1()
>>> model = GradedResponse(data)
>>> model.fit(train_data=data, algorithm=MML())
>>> thetas = model.latent_scores(data)
>>> # Initalize and fit the flow scale transformation. Supply it to the model.
>>> flow = Flow(1)
>>> flow.fit(thetas)
>>> model.add_scale_transformation(flow)
>>> # Estimate thetas on the transformed scale
>>> rescaled_thetas = model.latent_scores(data)
>>> # Or alternatively by directly converting the old ones
>>> rescaled_thetas = model.transform_theta(thetas)
>>> # Plot the differences
>>> model.plot.latent_score_distribution(thetas).show()
>>> model.plot.latent_score_distribution(rescaled_thetas).show()
>>> # Put the thetas back to the original scale
>>> original_thetas = model.inverse_transform_theta(rescaled_thetas)
>>> # Plot an item on the flow transformed scale
>>> model.plot.item_probabilities(1).show()

fit(theta: Tensor, transformation: irtorch.torch_modules.rational_quadratic_spline.RationalQuadraticSpline | None = None, distribution: torch.distributions.distribution.Distribution | None = None, batch_size: int | None = None, learning_rate: float = 0.01, learning_rate_updates_before_stopping: int = 2, evaluation_interval_size: int = 50, max_epochs: int = 1500, device: str = 'cpu', **kwargs)

Fits the normalizing flow to the data. Typically used from within an IRT model instance. Use batch_size if the data is too large to fit in memory.

Parameters

theta (torch.Tensor) – A 2D tensor containing latent variable theta scores of the population. Usually the training data. Each column represents one latent variable.
transformation (RationalQuadraticSpline, optional) – The transformation to apply to the data.
distribution (Distribution, optional) – The distribution to apply to the latent variables. If None, a standard normal distribution is used.
batch_size (int, optional) – The batch size for the data loader. (default is None and uses the full dataset)
learning_rate (float, optional) – The learning rate for the optimizer. (default is 0.01)
learning_rate_updates_before_stopping (int, optional) – The number of learning rate updates before stopping the training. (default is 2)
evaluation_interval_size (int, optional) – The number of iterations between each model evaluation during training. (default is 50)
max_epochs (int, optional) – The maximum number of epochs to train the flow. (default is 1500)
device (str, optional) – The device to use for the computation. (default is “cuda” if available, otherwise “cpu”)
**kwargs – Additional keyword arguments for irtorch.torch_modules.RationalQuadraticSpline constructor. By default, the spline is set to have 50 bins and the input bounds are set to -5.5 and 5.5 and output bounds to -3.0 and 3.0.

inverse(transformed_theta: Tensor) → Tensor

Puts the scores back to the original theta scale.

Parameters: transformed_theta (torch.Tensor) – A 2D tensor containing transformed theta scores. Each column represents one latent variable.
Returns: A 2D tensor containing theta scores on the the original scale.
Return type: torch.Tensor

jacobian(theta: Tensor) → Tensor

Computes the Jacobian of scale scores for each \(j\) with respect to the input theta scores.

Parameters: theta (torch.Tensor) – A 2D tensor containing latent variable theta scores. Each column represents one latent variable.
Returns: A tensor with the Jacobian for each input row. Dimensions are (theta rows, latent variables, latent variables) where the last two are the jacobians.
Return type: torch.Tensor

transform(theta: Tensor) → Tensor

Transforms the input theta scores into the new scale.

Parameters: theta (torch.Tensor) – A 2D tensor containing latent variable theta scores. Each column represents one latent variable.

class irtorch.rescale.LinkCommonItems(model_from: BaseIRTModel, model_to: BaseIRTModel, model_from_common_item_indices: list[int], model_to_common_item_indices: list[int], method: str = 'spline', inverted: bool = False, **kwargs)

Bases: Scale

Link theta scales from two different IRT models to the same scale using common (anchor) items. Either rational quadratic splines [7] or monotonic neural networks [12] can be used to link the scales. Currently only supports unidimensional models.

Parameters

model_from (BaseIRTModel) – The IRT model which scale to transform.
model_to (BaseIRTModel) – The scale of model_from will be linked to the scale of model_to.
model_from_common_item_indices (list[int]) – The indices of the items in model_from that are also in model_to (first item is index 0).
model_to_common_item_indices (list[int]) – The indices of the items in model_to that are also in model_from (first item is index 0).
method (str, optional) – The method to use for linking the scales. Either “spline” or “neuralnet”. Default is “spline”. Note that the splines uses a fixed range of -5.5 to 5.5 for input values and a learned output range with initial values of -5.5 to 5.5. If latent scores are outside this range are common for your models, you may need to adjust the bounds. See irtorch.torch_modules.RationalQuadraticSpline for more information.
inverted (bool, optional) – Set to true if the theta scale of one model is inverted. Default is False.
**kwargs – Additional keyword arguments for irtorch.torch_modules.RationalQuadraticSpline constructor when method is “spline”. When method is “neuralnet”, the number of neurons in the hidden layer can be set with the neurons argument. Note that the number of neurons must be divisible by 3. Default is 9. By default, the spline is set to have 50 bins and the input bounds are set to -5.5 and 5.5 and output bounds to -3.0 and 3.0.

Notes

We have two models fitted using data from two different populations, P and Q. We also have some items in common between the models for the purpose of linking. Let \(\theta_P\) and \(\theta_Q\) be points from the latent trait scales from the models fitted to P and Q respectively. Our goal is to find a linking function \(g\left(\theta_P\right)\) which takes a \(\theta_P\) and outputs the equivalent \(\theta_Q\). This is done by finding the \(g\left(\theta_P\right)\) that minimizes the KL divergence between the transformed and linked item curves.

\[\int \sum_{j \in \text { common }} \sum_x D_{K L}\left[P_P\left(X_j=x|\theta_P\right) \Vert P_Q \left(X_j=x| \theta_Q = g\left(\theta_P \right) \right)\right] f(\theta_P) d \theta_P\]

\(P_P\left(X_j=x|\theta_P\right)\) is the probability for a score \(x\) on item \(j\) from the model fitted to population P given \(\theta_P\).
\(P_Q \left(X_j=x| \theta_Q = g\left(\theta_P \right)\right)\) is the probability for a score \(x\) on item \(j\) from the model fitted to population Q given \(\theta_Q = g\left(\theta_P \right)\).
\(f(\theta_P)\) is the density of the latent trait distribution in population P.
The sums are over the common items and their possible responses.

Examples

>>> import irtorch
>>> from irtorch.rescale import LinkCommonItems
>>> from irtorch.models import ThreeParameterLogistic
>>> from irtorch.estimation_algorithms import JML, MML
>>> data = irtorch.load_dataset.swedish_sat_binary()[:, :80]
>>> # As an illustration, we split the dataset into two parts and use 20 common items.
>>> # In practice, we would of course use different datasets for each model.
>>> data1 = data[:2500, :50]
>>> data2 = data[2500:, 30:]
>>> model1 = ThreeParameterLogistic(items=50)
>>> model2 = ThreeParameterLogistic(items=50)
>>> model1.fit(train_data=data1, algorithm=MML())
>>> model2.fit(train_data=data2, algorithm=JML())
>>> # Link the scale of model 2 to the model 1 scale using common items.
>>> link = LinkCommonItems(model2, model1, list(range(20)), list(range(30, 50)))
>>> link.fit(theta_from = model2.latent_scores(data2), learning_rate=0.01, max_epochs=1000)
>>> model2.add_scale_transformation(link)
>>> # Plot the transformation
>>> model2.plot.scale_transformations(input_theta_range=(-5, 5)).show()

fit(theta_from: Tensor, batch_size: int | None = None, learning_rate: float = 0.01, learning_rate_updates_before_stopping: int = 1, evaluation_interval_size: int = 50, max_epochs: int = 1000, device: str = 'cpu')

Fits the normalizing flow to the data. Typically used from within an IRT model instance. Use batch_size if the data is too large to fit in memory.

Parameters

theta_from (torch.Tensor) – A 2D tensor containing latent variable theta scores from the model which theta scale we are transforming (model_from). Usually the training data and respresents the population. Each column represents one latent variable.
batch_size (int, optional) – The batch size for the data loader. Default is None and uses no batches.
learning_rate (float, optional) – The learning rate for the optimizer. Default is 0.1.
learning_rate_updates_before_stopping – The number of learning rate updates before stopping the training. Default is 1.
optional – The number of learning rate updates before stopping the training. Default is 1.
evaluation_interval_size (int, optional) – The number of iterations between each model evaluation during training. (default is 50)
max_epochs (int, optional) – The maximum number of epochs to train the flow. Default is 1000.
device (str, optional) – The device to use for the computation. Default is “cuda” if available, otherwise “cpu”.

inverse(transformed_theta: Tensor) → Tensor

Puts the scores back to the original theta scale.

Parameters: transformed_theta (torch.Tensor) – A 2D tensor containing transformed theta scores. Each column represents one latent variable.
Returns: A 2D tensor containing theta scores on the the original scale.
Return type: torch.Tensor

jacobian(theta: Tensor) → Tensor

Computes the Jacobian of scale scores for each \(j\) with respect to the input theta scores.

Parameters: theta (torch.Tensor) – A 2D tensor containing latent variable theta scores. Each column represents one latent variable.
Returns: A tensor with the Jacobian for each input row. Dimensions are (theta rows, latent variables, latent variables) where the last two are the jacobians.
Return type: torch.Tensor

transform(theta: Tensor) → Tensor

Transforms the input theta scores into the new scale.

Parameters: theta (torch.Tensor) – A 2D tensor containing latent variable theta scores. Each column represents one latent variable.

class irtorch.rescale.RankCDF(theta: Tensor, distributions: list[torch.distributions.distribution.Distribution] | None = None)

Bases: Scale

Rank-based inverse CDF transformation of IRT theta scales.

For each latent variable, finds the rank of each theta score in the population. For new data, finds the closest matching population ranks and uses the inverse CDF to find the equivalents scores in the chosen distribution(s).

Note that while this method is fast, it is heavily reliant on the input theta scores covering the entire range of the distribution(s). It is also not invertible, does not support gradient computation and the transformation is not unique one-to-one.

Parameters

theta (torch.Tensor) – A large tensor of theta scores representing the population.
distributions (list[torch.distributions.Distribution], optional) – The distributions to use for the transformation of each latent variable. If None, normal distributions are used.

Examples

>>> import irtorch
>>> from irtorch.models import GradedResponse
>>> from irtorch.estimation_algorithms import MML
>>> from irtorch.rescale import RankCDF
>>> data = irtorch.load_dataset.swedish_national_mathematics_1()
>>> model = GradedResponse(data)
>>> model.fit(train_data=data, algorithm=MML())
>>> thetas = model.latent_scores(data)
>>> # Create and RankCDF instancce and supply it to the model.
>>> model.add_scale_transformation(RankCDF(thetas))
>>> # Estimate thetas on the transformed scale
>>> rescaled_thetas = model.latent_scores(data)
>>> # Or alternatively by directly converting the old ones
>>> rescaled_thetas = model.transform_theta(thetas)
>>> # Plot the differences
>>> model.plot.latent_score_distribution(thetas).show()
>>> model.plot.latent_score_distribution(rescaled_thetas).show()
>>> # Plot an item on the transformed scale
>>> model.plot.item_probabilities(1).show()

inverse(transformed_theta)

Puts the scores back to the original theta scale.

Parameters: transformed_theta (torch.Tensor) – A 2D tensor containing transformed theta scores. Each column represents one latent variable.
Returns: A 2D tensor containing theta scores on the the original scale.
Return type: torch.Tensor

jacobian(theta: Tensor) → Tensor

Computes the gradients of scale scores with respect to the input theta scores.

Parameters: theta (torch.Tensor) – A 2D tensor containing latent variable theta scores. Each column represents one latent variable.
Returns: A torch tensor with the gradients for each theta score. Dimensions are (theta rows, latent variables, latent variables) where the last two are the jacobians.
Return type: torch.Tensor

transform(theta: Tensor) → Tensor

Transforms the input theta scores into the new scale.

Parameters: theta (torch.Tensor) – A 2D tensor containing transformed theta scores. Each column represents one latent variable.
Returns: A 2D tensor containing the transformed theta scores.
Return type: torch.Tensor

class irtorch.rescale.Reverse(reversed_latent_variables: list[bool])

Bases: Scale

Reverses the chosen theta scales using.

Parameters: reversed_latent_variables (list[bool]) – A list of booleans indicating which latent variables to reverse.

Examples

>>> import irtorch
>>> from irtorch.models import NominalResponse, MonotoneNN
>>> from irtorch.estimation_algorithms import AE, MML
>>> from irtorch.rescale import Reverse
>>> irtorch.set_seed(15)
>>> data_sat, correct_responses = irtorch.load_dataset.swedish_sat_verbal()
>>> model = NominalResponse(data=data_sat, mc_correct=correct_responses)
>>> model.fit(train_data=data_sat, algorithm=MML())
>>> model.plot.item_probabilities(1).show()
>>> # reverse the first (and only) latent variable
>>> reverse = Reverse([True])
>>> model.add_scale_transformation(reverse)
>>> model.plot.item_probabilities(1).show()

inverse(transformed_theta: Tensor) → Tensor

Puts the scores back to the original theta scale.

Parameters: transformed_theta (torch.Tensor) – A 2D tensor containing transformed theta scores. Each column represents one latent variable.
Returns: A 2D tensor containing theta scores on the the original scale.
Return type: torch.Tensor

jacobian(theta: Tensor) → Tensor

Computes the gradients of scale scores for each \(j\) with respect to the input theta scores.

Parameters: theta (torch.Tensor) – A 2D tensor containing latent variable theta scores. Each column represents one latent variable.
Returns: A torch tensor with the gradients for each theta score. Dimensions are (theta rows, latent variables, latent variables) where the last two are the jacobians.
Return type: torch.Tensor

transform(theta: Tensor) → Tensor

Transforms the input theta scores into the new scale.

Parameters: theta (torch.Tensor) – A 2D tensor containing latent variable theta scores. Each column represents one latent variable.

class irtorch.rescale.Rotate(model: BaseIRTModel, data: torch.Tensor | None = None, theta: torch.Tensor | None = None, loadings: torch.Tensor | None = None, rotation_method: str = 'promax', rotation_matrix: torch.Tensor | None = None, **kwargs)

Bases: Scale

Rotates the latent variables to improve interpretability. Utilizes the factor_analyzer package for rotations.

If the model has already been rescaled using irtorch.rescale() the rotation is applied to the rescaled latent variables. If you do not want this, use irtorch.models.BaseIRTModel.detach_rescale() before applying the rotation.

Parameters

model (BaseIRTModel) – The IRT model which scales to rotate.
data (torch.Tensor, optional) – The popluation data to compute the latent variable “loadings” for each item.
theta (torch.Tensor, optional) – The original scale theta scores to compute the latent variable “loadings” for each item.
loadings (torch.Tensor, optional) – A torch tensor with the loadings for each item. If specified, data and theta are ignored and the loadings are used for rotation. (default is None)
rotation_method (str, optional) –
The rotation method to use. For available options, see factor_analyzer. (default is “promax”)
rotation_matrix (torch.Tensor, optional) – A torch tensor with the rotation matrix. If specified, data, theta and rotation_method are ignored and the rotation matrix is used directly. (default is None)
**kwargs – Additional keyword arguments used for theta estimation. Refer to irtorch.models.BaseIRTModel.latent_scores() for additional details.

inverse(transformed_theta: Tensor) → Tensor

Puts the scores back to the original theta scale.

Parameters: transformed_theta (torch.Tensor) – A 2D tensor containing transformed theta scores. Each column represents one latent variable.
Returns: A 2D tensor containing theta scores on the the original scale.
Return type: torch.Tensor

jacobian(theta: Tensor) → Tensor

Computes the gradients of rotated scores for each \(j\) with respect to the original theta scores.

Parameters: theta (torch.Tensor) – A 2D tensor containing latent variable theta scores on the original theta scale. Each column represents one latent variable.
Returns: A torch tensor with the gradients for each theta score. Dimensions are (theta rows, latent variables, latent variables) where the last two are the jacobians.
Return type: torch.Tensor

transform(theta: Tensor) → Tensor

Transforms the input theta scores into the new scale.

Parameters: theta (torch.Tensor) – A 2D tensor containing latent variable theta scores. Each column represents one latent variable.