DocumentationAPI Reference
API Reference
Complete reference for AlignAIR command-line interface, parameters, input formats, and output schemas.
📖 Quick Navigation
Command Line Interface
Basic Syntax
python app.py [COMMAND] [OPTIONS]
Available Commands
run
PrimaryExecute AlignAIR sequence analysis with specified parameters
--help
UtilityDisplay help information and parameter list
--version
UtilityShow AlignAIR version information
Exit Codes
0
Success1
General Error2
Invalid Arguments3
File Not FoundComplete Parameters Reference
Required Parameters
Parameter | Type | Description | Example |
---|---|---|---|
--model-checkpoint | string | Path to trained model weights | /app/pretrained_models/IGH_S5F_576 |
--chain-type | choice | heavy | light | heavy |
--sequences | string | Path to input sequence file | /data/input/sequences.csv |
--save-path | string | Output directory path | /data/output |
Optional Parameters
Model Configuration
--max-input-size
int (default: 576)Maximum sequence length--batch-size
int (default: 2048)Processing batch sizeThreshold Settings
--v-allele-threshold
float (default: 0.75)V gene threshold--d-allele-threshold
float (default: 0.3)D gene threshold--j-allele-threshold
float (default: 0.8)J gene threshold--v-cap / --d-cap / --j-cap
int (default: 3)Maximum calls per geneProcessing Options
--airr-format
flagOutput full AIRR schema--fix-orientation
flagAuto-correct sequence orientation--translate-to-asc
flagUse ASC allele names--save-predict-object
flagSave raw predictions for debuggingInput Formats
CSV Format
Required Column:
sequence
Example:
sequence_id,sequence seq_001,CAGGTGCAGCTG... seq_002,GAGGTGCAGCTG...
TSV Format
Delimiter:
Tab-separated
Example:
sequence_id sequence seq_001 CAGGTGCAGCTG... seq_002 GAGGTGCAGCTG...
FASTA Format
Standard Format:
>header
Example:
>seq_001 CAGGTGCAGCTGGTGGAG... >seq_002 GAGGTGCAGCTGGTGGAG...
Input Validation
✅ Valid Sequences
- • DNA nucleotides: A, T, G, C
- • IUPAC ambiguous codes: N, R, Y, etc.
- • Minimum length: 50 nucleotides
- • Maximum length: Auto-trimmed to max-input-size
❌ Invalid Sequences
- • Protein sequences (amino acids)
- • Empty or whitespace-only sequences
- • Sequences with invalid characters
- • Extremely short sequences (less than 50 nt)
Output Schema
Standard Output Columns
sequence_id
string
Unique sequence identifier
seq_001
v_call
string
V gene assignment(s)
IGHV1-2*01
d_call
string
D gene assignment(s)
IGHD3-3*01
j_call
string
J gene assignment(s)
IGHJ4*01
productive
boolean
Sequence productivity status
True
sequence
string
Input sequence (preserved)
CAGGTG...
AIRR Schema Output (--airr-format)
Additional Columns
- •
sequence_alignment
- Aligned sequence - •
germline_alignment
- Germline reference - •
v_sequence_start
- V region start position - •
v_sequence_end
- V region end position - •
d_sequence_start
- D region start position - •
d_sequence_end
- D region end position
Standardized Fields
- •
j_sequence_start
- J region start position - •
j_sequence_end
- J region end position - •
cdr3
- CDR3 sequence - •
cdr3_aa
- CDR3 amino acid sequence - •
mutation_count
- Number of mutations - •
sequence_length
- Total sequence length
Data Types & Formats
Gene Calls
Single call:
IGHV1-2*01
Multiple calls:
IGHV1-2*01,IGHV1-3*01
Coordinates
1-indexed positions:
v_start: 1, v_end: 285
Sequences
Uppercase nucleotides:
CAGGTGCAGCTG...
📋 Quick Reference Card
Minimal Command
python app.py run --model-checkpoint=... --chain-type=... --sequences=... --save-path=...
High Stringency
--v-allele-threshold=0.9 --j-allele-threshold=0.9 --v-cap=1 --j-cap=1
Large Datasets
--batch-size=4096 --fix-orientation
Full Output
--airr-format --save-predict-object
© 2025 AlignAIR. All rights reserved.•Advancing computational biology through AI