Documentation

Usage Guide

AlignAIR can be easily used through its Docker container interface, offering flexibility and speed for sequence alignment tasks.

Quick Start

After starting the AlignAIR Docker container, run the following command inside it:

python app.py run \ --model-checkpoint=/app/pretrained_models/IGH_S5F_576 \ --save-path=/data/output \ --chain-type=heavy \ --sequences=/app/tests/sample_HeavyChain_dataset.csv

💡 Tip: Modify the parameters as needed to match your input and model requirements.

Parameter Categories

Model Settings

• model_checkpoint - Model weights path
• chain_type - Heavy or light chain
• max_input_size - Input window size
• batch_size - Sequences per batch

Input & Output

• sequences - Input file path
• save_path - Output directory
• airr_format - Full AIRR schema

Thresholds

• v_allele_threshold - V call threshold
• d_allele_threshold - D call threshold
• j_allele_threshold - J call threshold
• v_cap / d_cap / j_cap - Max calls

Complete Parameter Reference

Parameter	Description	Default
Model Settings
`model_checkpoint`	Path to model weights. Docker ships with IGH_S5F_576 and IGL_S5F_576	Required
`chain_type`	Specify heavy or light chain for alignment functionality	Required
`max_input_size`	Maximum input window size. Longer reads are trimmed during preprocessing	576
`batch_size`	Number of sequences per batch. Larger values can improve runtime	2048
Input and Output
`sequences`	Path to sequence file (CSV/TSV/FASTA). Must have a "sequence" column for tables	Required
`save_path`	Path to save output (AIRR Schema CSV format)	Required
`airr_format`	Output full AIRR Schema instead of essential columns only	false
Threshold Settings
`v_allele_threshold`	Threshold for V allele calling. Higher values = more stringent	0.75
`d_allele_threshold`	Threshold for D allele calling. Lower due to D region complexity	0.3
`j_allele_threshold`	Threshold for J allele calling	0.8
`v_cap / d_cap / j_cap`	Maximum number of calls allowed for V/D/J alleles	3
Preprocessing and Corrections
`translate_to_asc`	Output ASC alleles instead of IMGT names	false
`fix_orientation`	Automatically correct reverse/complement orientations before alignment	true

Example Commands

Heavy Chain Analysis

python app.py run \ --model-checkpoint=/app/pretrained_models/IGH_S5F_576 \ --chain-type=heavy \ --sequences=/data/input/heavy_sequences.csv \ --save-path=/data/output/heavy_results \ --v-allele-threshold=0.75 \ --d-allele-threshold=0.3 \ --j-allele-threshold=0.8

Light Chain Analysis

python app.py run \ --model-checkpoint=/app/pretrained_models/IGL_S5F_576 \ --chain-type=light \ --sequences=/data/input/light_sequences.csv \ --save-path=/data/output/light_results \ --airr-format \ --fix-orientation

Tips & Best Practices

Performance

• Use larger batch sizes for better GPU utilization
• Process sequences in batches of similar lengths
• Monitor memory usage with large datasets

Accuracy

• Use appropriate thresholds for your data quality
• Enable orientation fixing for mixed datasets
• Choose the correct chain type model

Output

• Enable AIRR format for downstream analysis
• Save prediction objects for debugging
• Use meaningful output directory names

Ready for Advanced Usage?

Explore technical details, examples, and troubleshooting guides to get the most out of AlignAIR.

Technical Details View Examples

Installation

Star on GitHub Report Issue

Examples

Terms•License•Open Source