Thresholding Logic
AlignAIR uses a dynamic thresholding strategy to convert model likelihood outputs into final allele calls. This post-processing step ensures robustness while maintaining alignment accuracy.
Overview
The AlignAIR model outputs a likelihood vector for each of the V, D, and J gene segments. Each vector contains probabilities corresponding to each possible allele in the reference set. To determine the final predicted alleles, AlignAIR applies a Maximum Likelihood Thresholding method followed by a cap enforcement procedure.
Algorithm
- For each segment (V, D, J), let the output vector be
p = [p_1, ..., p_n]
. - Compute the maximum likelihood:
pmax = max(p)
. - Define a threshold:
threshold = Φ × pmax
, whereΦ
is a segment-specific parameter (e.g. 0.75 for V, 0.3 for D, 0.8 for J). - Filter alleles: keep all
p_i
such thatp_i ≥ threshold
. - Apply cap: if the number of alleles passing the threshold exceeds a predefined cap (e.g. 3), keep only the top scoring ones.
Intuition
This method captures the probabilistic nature of the model's predictions while maintaining a clear cutoff to reduce noise. For example, if multiple alleles are highly likely, the model retains all those above the dynamic threshold, instead of arbitrarily selecting the top-k. The cap prevents the system from becoming overly permissive.
Optimization Strategy
The optimal values of Φ
and cap were selected via grid search to maximize agreement with ground truth labels while minimizing the number of alleles returned. This creates a balance between sensitivity (returning all plausible candidates) and specificity (not returning noise).
Special Case: D Region
Due to the short and highly mutated nature of D segments, an additional label called Short-D
is added to the likelihood vector. If this label receives high probability (> 0.5), the model suppresses other D allele predictions using a penalty term. This ensures consistency between segmentation and classification, avoiding spurious allele calls when the D region is unreliable.
References
See supplementary section 1.5.2 in the AlignAIR manuscript for full implementation details.