AlignAIR Logo
Documentation
// technical / thresholding

Thresholding Logic

AlignAIR uses a dynamic thresholding strategy to convert model likelihood outputs into final allele calls. This post-processing step ensures robustness while maintaining alignment accuracy.

// 01 / overview

Overview

The AlignAIR model outputs a likelihood vector for each of the V, D, and J gene segments. Each vector contains probabilities corresponding to each possible allele in the reference set. To determine the final predicted alleles, AlignAIR applies a Maximum Likelihood Thresholding method followed by a cap enforcement procedure.

// 02 / algorithm

Thresholding algorithm

steps

1

Input likelihood vector

For each segment (V, D, J), let the output vector be p = [p₁, …, pₙ].
2

Find maximum

Compute the maximum likelihood: p_max = max(p).
3

Calculate threshold

Define threshold: threshold = Φ × p_max.
4

Filter alleles

Keep all pᵢ such that pᵢ ≥ threshold.
5

Apply cap

If results exceed the cap, keep only the top-scoring alleles.

example · V allele selection

Input likelihood vector
IGHV1-2*010.92
IGHV1-3*010.78
IGHV1-4*010.45
p_max × Φ (0.75)
threshold = 0.92 × 0.75 = 0.69
Kept (≥ 0.69)
IGHV1-2*01, IGHV1-3*01
// 03 / defaults

Default threshold parameters

V

V segment

threshold (Φ)0.75
default cap3

High threshold due to V region length and conservation.

D

D segment

threshold (Φ)0.30
default cap3

Lower threshold due to D region’s short length and high mutation.

J

J segment

threshold (Φ)0.80
default cap3

High threshold due to J region’s conserved nature.

// 04 / short-d

Special case: Short-D handling

Due to the short and highly mutated nature of D segments, an additional label called Short-D is added to the likelihood vector.

logic

  1. 1. Detection
    If Short-D probability > 0.5
  2. 2. Suppression
    Apply penalty to other D allele predictions
  3. 3. Consistency
    Ensures alignment between segmentation and classification

example scenario

Before check
IGHD1-1*010.45
IGHD2-2*010.38
Short-D0.65
After suppression
IGHD1-1*01suppressed
IGHD2-2*01suppressed
Resultno D call
// intuition

Why this works

This method captures the probabilistic nature of the model's predictions while maintaining a clear cutoff to reduce noise.

  • Retains all plausible candidates above threshold
  • Avoids arbitrary top-k selection
  • Balances sensitivity and specificity
// optimization

Optimization strategy

The optimal values of Φ and cap were selected via grid search to maximize agreement with ground truth labels.

Sensitivity83%
Specificity80%
Efficiency75%
// references

References

See supplementary section 1.5.2 in the AlignAIR manuscript for full implementation details and performance analysis.

© 2025 AlignAIR. All rights reserved.Advancing computational biology through AI