AlignAIR Logo
Documentation

Thresholding Logic

AlignAIR uses a dynamic thresholding strategy to convert model likelihood outputs into final allele calls. This post-processing step ensures robustness while maintaining alignment accuracy.

Overview

The AlignAIR model outputs a likelihood vector for each of the V, D, and J gene segments. Each vector contains probabilities corresponding to each possible allele in the reference set. To determine the final predicted alleles, AlignAIR applies a Maximum Likelihood Thresholding method followed by a cap enforcement procedure.

Thresholding Algorithm

Algorithm Steps

1

Input Likelihood Vector

For each segment (V, D, J), let the output vector be p = [p₁, ..., pₙ]
2

Find Maximum

Compute the maximum likelihood: p_max = max(p)
3

Calculate Threshold

Define threshold: threshold = Φ × p_max
4

Filter Alleles

Keep all pᵢ such that pᵢ ≥ threshold
5

Apply Cap

If results exceed cap, keep only the top scoring alleles

Example: V Allele Selection

Input Likelihood Vector:
IGHV1-2*01
0.85
IGHV1-3*01
0.72
IGHV1-4*01
0.58
IGHV2-1*01
0.23
Threshold Calculation:
p_max = 0.85
Φ = 0.75 (V threshold)
threshold = 0.75 × 0.85 = 0.64
Selected Alleles:
IGHV1-2*01 (0.85 ≥ 0.64)
IGHV1-3*01 (0.72 ≥ 0.64)
IGHV1-4*01 (0.58 < 0.64)

Default Threshold Parameters

V

V Segment

Threshold (Φ)
0.75
Default Cap
3
High threshold due to V region's length and conservation
D

D Segment

Threshold (Φ)
0.30
Default Cap
3
Lower threshold due to D region's short length and high mutation
J

J Segment

Threshold (Φ)
0.80
Default Cap
3
High threshold due to J region's conserved nature

Special Case: Short-D Handling

Due to the short and highly mutated nature of D segments, an additional label called Short-D is added to the likelihood vector.

Short-D Logic

1
Detection
If Short-D probability > 0.5
2
Suppression
Apply penalty to other D allele predictions
3
Consistency
Ensures alignment between segmentation and classification

Example Scenario

Before Short-D Check:
IGHD1-1*010.45
IGHD2-2*010.38
Short-D0.65
After Short-D Suppression:
IGHD1-1*01Suppressed
IGHD2-2*01Suppressed
ResultNo D call

Intuition

This method captures the probabilistic nature of the model's predictions while maintaining a clear cutoff to reduce noise.

Key Benefits:

  • Retains all plausible candidates above threshold
  • Avoids arbitrary top-k selection
  • Balances sensitivity and specificity

Optimization Strategy

The optimal values of Φ and cap were selected via grid search to maximize agreement with ground truth labels.

Optimization Goals:

Sensitivity
Specificity
Efficiency

📖 References

See supplementary section 1.5.2 in the AlignAIR manuscript for full implementation details and performance analysis.

© 2025 AlignAIR. All rights reserved.Advancing computational biology through AI