DocumentationAPI Reference
// api reference
API Reference
Complete reference for AlignAIR command-line interface, parameters, input formats, and output schemas.
// 01 / cli
Command Line Interface
basic syntax
python app.py [COMMAND] [OPTIONS]
available commands
runExecute AlignAIR sequence analysis with specified parameters.
primary--helpDisplay help information and parameter list.
utility--versionShow AlignAIR version information.
utilityexit codes
0Success1General error2Invalid arguments3File not found// 02 / parameters
Complete parameters reference
required
| Parameter | Type | Description | Example |
|---|---|---|---|
--model-checkpoint | string | Path to trained model weights | /app/pretrained_models/IGH_S5F_576 |
--chain-type | choice | heavy | light | heavy |
--sequences | string | Path to input sequence file | /data/input/sequences.csv |
--save-path | string | Output directory path | /data/output |
model configuration
--max-input-sizeint (default: 576)
Maximum sequence length
--batch-sizeint (default: 2048)
Processing batch size
threshold settings
--v-allele-thresholdfloat (default: 0.75)
V gene threshold
--d-allele-thresholdfloat (default: 0.3)
D gene threshold
--j-allele-thresholdfloat (default: 0.8)
J gene threshold
--v-cap / --d-cap / --j-capint (default: 3)
Maximum calls per gene
processing options
--airr-formatflag
Output full AIRR schema
--fix-orientationflag
Auto-correct sequence orientation
--translate-to-ascflag
Use ASC allele names
--save-predict-objectflag
Save raw predictions for debugging
// 03 / input formats
Input formats
CSV
Required column: sequence
sequence_id,sequence seq_001,CAGGTGCAGCTG… seq_002,GAGGTGCAGCTG…
TSV
Tab-separated values
sequence_id sequence seq_001 CAGGTGCAGCTG… seq_002 GAGGTGCAGCTG…
FASTA
Standard >header format
>seq_001 CAGGTGCAGCTGGTGGAG… >seq_002 GAGGTGCAGCTGGTGGAG…
// valid sequences
- DNA nucleotides: A, T, G, C
- IUPAC ambiguous codes: N, R, Y, etc.
- Minimum length: 50 nucleotides
- Maximum length: auto-trimmed to max-input-size
// invalid sequences
- Protein sequences (amino acids)
- Empty or whitespace-only sequences
- Sequences with invalid characters
- Extremely short sequences (< 50 nt)
// 04 / output schema
Output schema
standard columns
| Column | Type | Description | Example |
|---|---|---|---|
sequence_id | string | Unique sequence identifier | seq_001 |
v_call | string | V gene assignment(s) | IGHV1-2*01 |
d_call | string | D gene assignment(s) | IGHD3-3*01 |
j_call | string | J gene assignment(s) | IGHJ4*01 |
productive | boolean | Sequence productivity status | True |
sequence | string | Input sequence (preserved) | CAGGTG… |
AIRR schema (--airr-format)
Additional columns
sequence_alignment— Aligned sequencegermline_alignment— Germline referencev_sequence_start/v_sequence_endd_sequence_start/d_sequence_end
Standardized fields
j_sequence_start/j_sequence_endcdr3/cdr3_aamutation_countsequence_length
data types & formats
Gene calls
Single:
IGHV1-2*01Multiple:
IGHV1-2*01,IGHV1-3*01Coordinates
1-indexed positions:
v_start: 1, v_end: 285Sequences
Uppercase nucleotides:
CAGGTGCAGCTG…// quick reference
Quick reference card
Minimal command
python app.py run --model-checkpoint=… --chain-type=… --sequences=… --save-path=…High stringency
--v-allele-threshold=0.9 --j-allele-threshold=0.9 --v-cap=1 --j-cap=1Large datasets
--batch-size=4096 --fix-orientationFull output
--airr-format --save-predict-object© 2025 AlignAIR. All rights reserved.·Advancing computational biology through AI