Thresholding Logic
AlignAIR uses a dynamic thresholding strategy to convert model likelihood outputs into final allele calls. This post-processing step ensures robustness while maintaining alignment accuracy.
Overview
The AlignAIR model outputs a likelihood vector for each of the V, D, and J gene segments. Each vector contains probabilities corresponding to each possible allele in the reference set. To determine the final predicted alleles, AlignAIR applies a Maximum Likelihood Thresholding method followed by a cap enforcement procedure.
Thresholding algorithm
steps
Input likelihood vector
p = [p₁, …, pₙ].Find maximum
p_max = max(p).Calculate threshold
threshold = Φ × p_max.Filter alleles
pᵢ such that pᵢ ≥ threshold.Apply cap
example · V allele selection
Default threshold parameters
V segment
High threshold due to V region length and conservation.
D segment
Lower threshold due to D region’s short length and high mutation.
J segment
High threshold due to J region’s conserved nature.
Special case: Short-D handling
Due to the short and highly mutated nature of D segments, an additional label called Short-D is added to the likelihood vector.
logic
- 1. DetectionIf Short-D probability > 0.5
- 2. SuppressionApply penalty to other D allele predictions
- 3. ConsistencyEnsures alignment between segmentation and classification
example scenario
Why this works
This method captures the probabilistic nature of the model's predictions while maintaining a clear cutoff to reduce noise.
- Retains all plausible candidates above threshold
- Avoids arbitrary top-k selection
- Balances sensitivity and specificity
Optimization strategy
The optimal values of Φ and cap were selected via grid search to maximize agreement with ground truth labels.
References
See supplementary section 1.5.2 in the AlignAIR manuscript for full implementation details and performance analysis.