AlignAIR Logo
Documentation

Usage Guide

AlignAIR can be easily used through its Docker container interface, offering flexibility and speed for sequence alignment tasks.

Quick Start

After starting the AlignAIR Docker container, run the following command inside it:

python app.py run \ --model-checkpoint=/app/pretrained_models/IGH_S5F_576 \ --save-path=/data/output \ --chain-type=heavy \ --sequences=/app/tests/sample_HeavyChain_dataset.csv

💡 Tip: Modify the parameters as needed to match your input and model requirements.

Parameter Categories

Model Settings

  • model_checkpoint - Model weights path
  • chain_type - Heavy or light chain
  • max_input_size - Input window size
  • batch_size - Sequences per batch

Input & Output

  • sequences - Input file path
  • save_path - Output directory
  • airr_format - Full AIRR schema

Thresholds

  • v_allele_threshold - V call threshold
  • d_allele_threshold - D call threshold
  • j_allele_threshold - J call threshold
  • v_cap / d_cap / j_cap - Max calls

Complete Parameter Reference

ParameterDescriptionDefault
Model Settings
model_checkpointPath to model weights. Docker ships with IGH_S5F_576 and IGL_S5F_576Required
chain_typeSpecify heavy or light chain for alignment functionalityRequired
max_input_sizeMaximum input window size. Longer reads are trimmed during preprocessing576
batch_sizeNumber of sequences per batch. Larger values can improve runtime2048
Input and Output
sequencesPath to sequence file (CSV/TSV/FASTA). Must have a "sequence" column for tablesRequired
save_pathPath to save output (AIRR Schema CSV format)Required
airr_formatOutput full AIRR Schema instead of essential columns onlyfalse
Threshold Settings
v_allele_thresholdThreshold for V allele calling. Higher values = more stringent0.75
d_allele_thresholdThreshold for D allele calling. Lower due to D region complexity0.3
j_allele_thresholdThreshold for J allele calling0.8
v_cap / d_cap / j_capMaximum number of calls allowed for V/D/J alleles3
Preprocessing and Corrections
translate_to_ascOutput ASC alleles instead of IMGT namesfalse
fix_orientationAutomatically correct reverse/complement orientations before alignmenttrue

Example Commands

Heavy Chain Analysis

python app.py run \ --model-checkpoint=/app/pretrained_models/IGH_S5F_576 \ --chain-type=heavy \ --sequences=/data/input/heavy_sequences.csv \ --save-path=/data/output/heavy_results \ --v-allele-threshold=0.75 \ --d-allele-threshold=0.3 \ --j-allele-threshold=0.8

Light Chain Analysis

python app.py run \ --model-checkpoint=/app/pretrained_models/IGL_S5F_576 \ --chain-type=light \ --sequences=/data/input/light_sequences.csv \ --save-path=/data/output/light_results \ --airr-format \ --fix-orientation

Tips & Best Practices

Performance

  • • Use larger batch sizes for better GPU utilization
  • • Process sequences in batches of similar lengths
  • • Monitor memory usage with large datasets

Accuracy

  • • Use appropriate thresholds for your data quality
  • • Enable orientation fixing for mixed datasets
  • • Choose the correct chain type model

Output

  • • Enable AIRR format for downstream analysis
  • • Save prediction objects for debugging
  • • Use meaningful output directory names

Ready for Advanced Usage?

Explore technical details, examples, and troubleshooting guides to get the most out of AlignAIR.

© 2025 AlignAIR. All rights reserved.Advancing computational biology through AI