Thresholding Logic
AlignAIR uses a dynamic thresholding strategy to convert model likelihood outputs into final allele calls. This post-processing step ensures robustness while maintaining alignment accuracy.
Overview
The AlignAIR model outputs a likelihood vector for each of the V, D, and J gene segments. Each vector contains probabilities corresponding to each possible allele in the reference set. To determine the final predicted alleles, AlignAIR applies a Maximum Likelihood Thresholding method followed by a cap enforcement procedure.
Thresholding Algorithm
Algorithm Steps
Input Likelihood Vector
p = [p₁, ..., pₙ]
Find Maximum
p_max = max(p)
Calculate Threshold
threshold = Φ × p_max
Filter Alleles
pᵢ
such that pᵢ ≥ threshold
Apply Cap
Example: V Allele Selection
Φ = 0.75 (V threshold)
threshold = 0.75 × 0.85 = 0.64
Default Threshold Parameters
V Segment
D Segment
J Segment
Special Case: Short-D Handling
Due to the short and highly mutated nature of D segments, an additional label called Short-D
is added to the likelihood vector.
Short-D Logic
Example Scenario
Intuition
This method captures the probabilistic nature of the model's predictions while maintaining a clear cutoff to reduce noise.
Key Benefits:
- Retains all plausible candidates above threshold
- Avoids arbitrary top-k selection
- Balances sensitivity and specificity
Optimization Strategy
The optimal values of Φ and cap were selected via grid search to maximize agreement with ground truth labels.
Optimization Goals:
📖 References
See supplementary section 1.5.2 in the AlignAIR manuscript for full implementation details and performance analysis.