DocumentationUsage
// usage
Usage Guide
AlignAIR ships as a Docker container with a CLI entry point. Run it locally, in CI, or on a cluster.
// quickstart
Quick start
After starting the AlignAIR Docker container, run the following command inside it:
python app.py run \ --model-checkpoint=/app/pretrained_models/IGH_S5F_576 \ --save-path=/data/output \ --chain-type=heavy \ --sequences=/app/tests/sample_HeavyChain_dataset.csv
tip Modify the parameters to match your input and model.
// parameters
Parameter categories
01 / model
Model settings
- model_checkpoint
- chain_type
- max_input_size
- batch_size
02 / io
Input & output
- sequences
- save_path
- airr_format
03 / thresholds
Thresholds
- v_allele_threshold
- d_allele_threshold
- j_allele_threshold
- v_cap / d_cap / j_cap
04 / preprocessing
Preprocessing & corrections
- translate_to_asc
- fix_orientation
// reference
Complete parameter reference
| Parameter | Description | Default |
|---|---|---|
| Model settings | ||
model_checkpoint | Path to model weights. Docker ships with IGH_S5F_576 and IGL_S5F_576. | Required |
chain_type | Heavy or light chain. | Required |
max_input_size | Maximum input window size. Longer reads are trimmed during preprocessing. | 576 |
batch_size | Number of sequences per batch. Larger values can improve runtime. | 2048 |
| Input & output | ||
sequences | Path to sequence file (CSV/TSV/FASTA). CSV/TSV must have a "sequence" column. | Required |
save_path | Output directory (AIRR Schema CSV format). | Required |
airr_format | Emit full AIRR Schema instead of essential columns only. | false |
| Thresholds | ||
v_allele_threshold | V call threshold. Higher = more stringent. | 0.75 |
d_allele_threshold | D call threshold. Lower due to D region complexity. | 0.3 |
j_allele_threshold | J call threshold. | 0.8 |
v_cap / d_cap / j_cap | Maximum number of calls allowed for V/D/J alleles. | 3 |
| Preprocessing & corrections | ||
translate_to_asc | Output ASC alleles instead of IMGT names. | false |
fix_orientation | Automatically correct reverse/complement orientations before alignment. | true |
// examples
Example commands
heavy chain
python app.py run \ --model-checkpoint=/app/pretrained_models/IGH_S5F_576 \ --chain-type=heavy \ --sequences=/data/input/heavy_sequences.csv \ --save-path=/data/output/heavy_results \ --v-allele-threshold=0.75 \ --d-allele-threshold=0.3 \ --j-allele-threshold=0.8
light chain
python app.py run \ --model-checkpoint=/app/pretrained_models/IGL_S5F_576 \ --chain-type=light \ --sequences=/data/input/light_sequences.csv \ --save-path=/data/output/light_results \ --airr-format \ --fix-orientation
// best practices
Tips & best practices
// performance
Performance
- Use larger batch sizes for better GPU utilization
- Process sequences in batches of similar lengths
- Monitor memory usage with large datasets
// accuracy
Accuracy
- Use appropriate thresholds for your data quality
- Enable orientation fixing for mixed datasets
- Choose the correct chain type model
// output
Output
- Enable AIRR format for downstream analysis
- Save prediction objects for debugging
- Use meaningful output directory names
// next
Ready for advanced usage?
Explore technical details, examples, and troubleshooting guides to get the most out of AlignAIR.
© 2025 AlignAIR. All rights reserved.·Advancing computational biology through AI