DocumentationUsage
Usage Guide
AlignAIR can be easily used through its Docker container interface, offering flexibility and speed for sequence alignment tasks.
Quick Start
After starting the AlignAIR Docker container, run the following command inside it:
python app.py run \ --model-checkpoint=/app/pretrained_models/IGH_S5F_576 \ --save-path=/data/output \ --chain-type=heavy \ --sequences=/app/tests/sample_HeavyChain_dataset.csv
💡 Tip: Modify the parameters as needed to match your input and model requirements.
Parameter Categories
Model Settings
- •
model_checkpoint
- Model weights path - •
chain_type
- Heavy or light chain - •
max_input_size
- Input window size - •
batch_size
- Sequences per batch
Input & Output
- •
sequences
- Input file path - •
save_path
- Output directory - •
airr_format
- Full AIRR schema
Thresholds
- •
v_allele_threshold
- V call threshold - •
d_allele_threshold
- D call threshold - •
j_allele_threshold
- J call threshold - •
v_cap / d_cap / j_cap
- Max calls
Complete Parameter Reference
Parameter | Description | Default |
---|---|---|
Model Settings | ||
model_checkpoint | Path to model weights. Docker ships with IGH_S5F_576 and IGL_S5F_576 | Required |
chain_type | Specify heavy or light chain for alignment functionality | Required |
max_input_size | Maximum input window size. Longer reads are trimmed during preprocessing | 576 |
batch_size | Number of sequences per batch. Larger values can improve runtime | 2048 |
Input and Output | ||
sequences | Path to sequence file (CSV/TSV/FASTA). Must have a "sequence" column for tables | Required |
save_path | Path to save output (AIRR Schema CSV format) | Required |
airr_format | Output full AIRR Schema instead of essential columns only | false |
Threshold Settings | ||
v_allele_threshold | Threshold for V allele calling. Higher values = more stringent | 0.75 |
d_allele_threshold | Threshold for D allele calling. Lower due to D region complexity | 0.3 |
j_allele_threshold | Threshold for J allele calling | 0.8 |
v_cap / d_cap / j_cap | Maximum number of calls allowed for V/D/J alleles | 3 |
Preprocessing and Corrections | ||
translate_to_asc | Output ASC alleles instead of IMGT names | false |
fix_orientation | Automatically correct reverse/complement orientations before alignment | true |
Example Commands
Heavy Chain Analysis
python app.py run \ --model-checkpoint=/app/pretrained_models/IGH_S5F_576 \ --chain-type=heavy \ --sequences=/data/input/heavy_sequences.csv \ --save-path=/data/output/heavy_results \ --v-allele-threshold=0.75 \ --d-allele-threshold=0.3 \ --j-allele-threshold=0.8
Light Chain Analysis
python app.py run \ --model-checkpoint=/app/pretrained_models/IGL_S5F_576 \ --chain-type=light \ --sequences=/data/input/light_sequences.csv \ --save-path=/data/output/light_results \ --airr-format \ --fix-orientation
Tips & Best Practices
Performance
- • Use larger batch sizes for better GPU utilization
- • Process sequences in batches of similar lengths
- • Monitor memory usage with large datasets
Accuracy
- • Use appropriate thresholds for your data quality
- • Enable orientation fixing for mixed datasets
- • Choose the correct chain type model
Output
- • Enable AIRR format for downstream analysis
- • Save prediction objects for debugging
- • Use meaningful output directory names
Ready for Advanced Usage?
Explore technical details, examples, and troubleshooting guides to get the most out of AlignAIR.
© 2025 AlignAIR. All rights reserved.•Advancing computational biology through AI