DocumentationAPI Reference
API Reference
Complete reference for AlignAIR command-line interface, parameters, input formats, and output schemas.
📖 Quick Navigation
Command Line Interface
Basic Syntax
python app.py [COMMAND] [OPTIONS]
Available Commands
runPrimaryExecute AlignAIR sequence analysis with specified parameters
--helpUtilityDisplay help information and parameter list
--versionUtilityShow AlignAIR version information
Exit Codes
0Success1General Error2Invalid Arguments3File Not FoundComplete Parameters Reference
Required Parameters
| Parameter | Type | Description | Example |
|---|---|---|---|
--model-checkpoint | string | Path to trained model weights | /app/pretrained_models/IGH_S5F_576 |
--chain-type | choice | heavy | light | heavy |
--sequences | string | Path to input sequence file | /data/input/sequences.csv |
--save-path | string | Output directory path | /data/output |
Optional Parameters
Model Configuration
--max-input-sizeint (default: 576)Maximum sequence length--batch-sizeint (default: 2048)Processing batch sizeThreshold Settings
--v-allele-thresholdfloat (default: 0.75)V gene threshold--d-allele-thresholdfloat (default: 0.3)D gene threshold--j-allele-thresholdfloat (default: 0.8)J gene threshold--v-cap / --d-cap / --j-capint (default: 3)Maximum calls per geneProcessing Options
--airr-formatflagOutput full AIRR schema--fix-orientationflagAuto-correct sequence orientation--translate-to-ascflagUse ASC allele names--save-predict-objectflagSave raw predictions for debuggingInput Formats
CSV Format
Required Column:
sequenceExample:
sequence_id,sequence seq_001,CAGGTGCAGCTG... seq_002,GAGGTGCAGCTG...
TSV Format
Delimiter:
Tab-separatedExample:
sequence_id sequence seq_001 CAGGTGCAGCTG... seq_002 GAGGTGCAGCTG...
FASTA Format
Standard Format:
>headerExample:
>seq_001 CAGGTGCAGCTGGTGGAG... >seq_002 GAGGTGCAGCTGGTGGAG...
Input Validation
✅ Valid Sequences
- • DNA nucleotides: A, T, G, C
- • IUPAC ambiguous codes: N, R, Y, etc.
- • Minimum length: 50 nucleotides
- • Maximum length: Auto-trimmed to max-input-size
❌ Invalid Sequences
- • Protein sequences (amino acids)
- • Empty or whitespace-only sequences
- • Sequences with invalid characters
- • Extremely short sequences (less than 50 nt)
Output Schema
Standard Output Columns
sequence_id
string
Unique sequence identifier
seq_001v_call
string
V gene assignment(s)
IGHV1-2*01d_call
string
D gene assignment(s)
IGHD3-3*01j_call
string
J gene assignment(s)
IGHJ4*01productive
boolean
Sequence productivity status
Truesequence
string
Input sequence (preserved)
CAGGTG...AIRR Schema Output (--airr-format)
Additional Columns
- •
sequence_alignment- Aligned sequence - •
germline_alignment- Germline reference - •
v_sequence_start- V region start position - •
v_sequence_end- V region end position - •
d_sequence_start- D region start position - •
d_sequence_end- D region end position
Standardized Fields
- •
j_sequence_start- J region start position - •
j_sequence_end- J region end position - •
cdr3- CDR3 sequence - •
cdr3_aa- CDR3 amino acid sequence - •
mutation_count- Number of mutations - •
sequence_length- Total sequence length
Data Types & Formats
Gene Calls
Single call:
IGHV1-2*01Multiple calls:
IGHV1-2*01,IGHV1-3*01Coordinates
1-indexed positions:
v_start: 1, v_end: 285Sequences
Uppercase nucleotides:
CAGGTGCAGCTG...📋 Quick Reference Card
Minimal Command
python app.py run --model-checkpoint=... --chain-type=... --sequences=... --save-path=...High Stringency
--v-allele-threshold=0.9 --j-allele-threshold=0.9 --v-cap=1 --j-cap=1Large Datasets
--batch-size=4096 --fix-orientationFull Output
--airr-format --save-predict-object© 2025 AlignAIR. All rights reserved.•Advancing computational biology through AI