Documentation

Documentation Technical DetailsThresholding Logic

Thresholding Logic

AlignAIR uses a dynamic thresholding strategy to convert model likelihood outputs into final allele calls. This post-processing step ensures robustness while maintaining alignment accuracy.

Overview

The AlignAIR model outputs a likelihood vector for each of the V, D, and J gene segments. Each vector contains probabilities corresponding to each possible allele in the reference set. To determine the final predicted alleles, AlignAIR applies a Maximum Likelihood Thresholding method followed by a cap enforcement procedure.

Thresholding Algorithm

Algorithm Steps

Input Likelihood Vector

For each segment (V, D, J), let the output vector be p = [p₁, ..., pₙ]

Find Maximum

Compute the maximum likelihood: p_max = max(p)

Calculate Threshold

Define threshold: threshold = Φ × p_max

Filter Alleles

Keep all pᵢ such that pᵢ ≥ threshold

Apply Cap

If results exceed cap, keep only the top scoring alleles

Example: V Allele Selection

Input Likelihood Vector:

IGHV1-2*01

0.85

IGHV1-3*01

0.72

IGHV1-4*01

0.58

IGHV2-1*01

0.23

Threshold Calculation:

p_max = 0.85
Φ = 0.75 (V threshold)
threshold = 0.75 × 0.85 = 0.64

Selected Alleles:

IGHV1-2*01 (0.85 ≥ 0.64)

IGHV1-3*01 (0.72 ≥ 0.64)

IGHV1-4*01 (0.58 < 0.64)

Default Threshold Parameters

V Segment

Threshold (Φ)

0.75

Default Cap

High threshold due to V region's length and conservation

D Segment

Threshold (Φ)

0.30

Default Cap

Lower threshold due to D region's short length and high mutation

J Segment

Threshold (Φ)

0.80

Default Cap

High threshold due to J region's conserved nature

Special Case: Short-D Handling

Due to the short and highly mutated nature of D segments, an additional label called Short-D is added to the likelihood vector.

Short-D Logic

Detection

If Short-D probability > 0.5

Suppression

Apply penalty to other D allele predictions

Consistency

Ensures alignment between segmentation and classification

Example Scenario

Before Short-D Check:

IGHD1-1*010.45

IGHD2-2*010.38

Short-D0.65

After Short-D Suppression:

IGHD1-1*01Suppressed

IGHD2-2*01Suppressed

ResultNo D call

Intuition

This method captures the probabilistic nature of the model's predictions while maintaining a clear cutoff to reduce noise.

Key Benefits:

Retains all plausible candidates above threshold
Avoids arbitrary top-k selection
Balances sensitivity and specificity

Optimization Strategy

The optimal values of Φ and cap were selected via grid search to maximize agreement with ground truth labels.

Optimization Goals:

Sensitivity

Specificity

Efficiency

📖 References

See supplementary section 1.5.2 in the AlignAIR manuscript for full implementation details and performance analysis.

Architecture

Star on GitHub Report Issue

Mutation Handling

Terms•License•Open Source