AlignAIR Logo
Documentation
// usage

Usage Guide

AlignAIR ships as a Docker container with a CLI entry point. Run it locally, in CI, or on a cluster.

// quickstart

Quick start

After starting the AlignAIR Docker container, run the following command inside it:

python app.py run \
  --model-checkpoint=/app/pretrained_models/IGH_S5F_576 \
  --save-path=/data/output \
  --chain-type=heavy \
  --sequences=/app/tests/sample_HeavyChain_dataset.csv

tip  Modify the parameters to match your input and model.

// parameters

Parameter categories

01 / model

Model settings

  • model_checkpoint
  • chain_type
  • max_input_size
  • batch_size
02 / io

Input & output

  • sequences
  • save_path
  • airr_format
03 / thresholds

Thresholds

  • v_allele_threshold
  • d_allele_threshold
  • j_allele_threshold
  • v_cap / d_cap / j_cap
04 / preprocessing

Preprocessing & corrections

  • translate_to_asc
  • fix_orientation
// reference

Complete parameter reference

ParameterDescriptionDefault
Model settings
model_checkpointPath to model weights. Docker ships with IGH_S5F_576 and IGL_S5F_576.Required
chain_typeHeavy or light chain.Required
max_input_sizeMaximum input window size. Longer reads are trimmed during preprocessing.576
batch_sizeNumber of sequences per batch. Larger values can improve runtime.2048
Input & output
sequencesPath to sequence file (CSV/TSV/FASTA). CSV/TSV must have a "sequence" column.Required
save_pathOutput directory (AIRR Schema CSV format).Required
airr_formatEmit full AIRR Schema instead of essential columns only.false
Thresholds
v_allele_thresholdV call threshold. Higher = more stringent.0.75
d_allele_thresholdD call threshold. Lower due to D region complexity.0.3
j_allele_thresholdJ call threshold.0.8
v_cap / d_cap / j_capMaximum number of calls allowed for V/D/J alleles.3
Preprocessing & corrections
translate_to_ascOutput ASC alleles instead of IMGT names.false
fix_orientationAutomatically correct reverse/complement orientations before alignment.true
// examples

Example commands

heavy chain

python app.py run \
  --model-checkpoint=/app/pretrained_models/IGH_S5F_576 \
  --chain-type=heavy \
  --sequences=/data/input/heavy_sequences.csv \
  --save-path=/data/output/heavy_results \
  --v-allele-threshold=0.75 \
  --d-allele-threshold=0.3 \
  --j-allele-threshold=0.8

light chain

python app.py run \
  --model-checkpoint=/app/pretrained_models/IGL_S5F_576 \
  --chain-type=light \
  --sequences=/data/input/light_sequences.csv \
  --save-path=/data/output/light_results \
  --airr-format \
  --fix-orientation
// best practices

Tips & best practices

// performance

Performance

  • Use larger batch sizes for better GPU utilization
  • Process sequences in batches of similar lengths
  • Monitor memory usage with large datasets
// accuracy

Accuracy

  • Use appropriate thresholds for your data quality
  • Enable orientation fixing for mixed datasets
  • Choose the correct chain type model
// output

Output

  • Enable AIRR format for downstream analysis
  • Save prediction objects for debugging
  • Use meaningful output directory names
// next

Ready for advanced usage?

Explore technical details, examples, and troubleshooting guides to get the most out of AlignAIR.

© 2025 AlignAIR. All rights reserved.Advancing computational biology through AI