AlignAIR Logo
Documentation
DocumentationAPI Reference
// api reference

API Reference

Complete reference for AlignAIR command-line interface, parameters, input formats, and output schemas.

// 01 / cli

Command Line Interface

basic syntax

python app.py [COMMAND] [OPTIONS]

available commands

run
Execute AlignAIR sequence analysis with specified parameters.
primary
--help
Display help information and parameter list.
utility
--version
Show AlignAIR version information.
utility

exit codes

0Success
1General error
2Invalid arguments
3File not found
// 02 / parameters

Complete parameters reference

required

ParameterTypeDescriptionExample
--model-checkpointstringPath to trained model weights/app/pretrained_models/IGH_S5F_576
--chain-typechoiceheavy | lightheavy
--sequencesstringPath to input sequence file/data/input/sequences.csv
--save-pathstringOutput directory path/data/output

model configuration

--max-input-size
int (default: 576)
Maximum sequence length
--batch-size
int (default: 2048)
Processing batch size

threshold settings

--v-allele-threshold
float (default: 0.75)
V gene threshold
--d-allele-threshold
float (default: 0.3)
D gene threshold
--j-allele-threshold
float (default: 0.8)
J gene threshold
--v-cap / --d-cap / --j-cap
int (default: 3)
Maximum calls per gene

processing options

--airr-format
flag
Output full AIRR schema
--fix-orientation
flag
Auto-correct sequence orientation
--translate-to-asc
flag
Use ASC allele names
--save-predict-object
flag
Save raw predictions for debugging
// 03 / input formats

Input formats

CSV

Required column: sequence

sequence_id,sequence
seq_001,CAGGTGCAGCTG…
seq_002,GAGGTGCAGCTG…

TSV

Tab-separated values

sequence_id	sequence
seq_001	CAGGTGCAGCTG…
seq_002	GAGGTGCAGCTG…

FASTA

Standard >header format

>seq_001
CAGGTGCAGCTGGTGGAG…
>seq_002
GAGGTGCAGCTGGTGGAG…

// valid sequences

  • DNA nucleotides: A, T, G, C
  • IUPAC ambiguous codes: N, R, Y, etc.
  • Minimum length: 50 nucleotides
  • Maximum length: auto-trimmed to max-input-size

// invalid sequences

  • Protein sequences (amino acids)
  • Empty or whitespace-only sequences
  • Sequences with invalid characters
  • Extremely short sequences (< 50 nt)
// 04 / output schema

Output schema

standard columns

ColumnTypeDescriptionExample
sequence_idstringUnique sequence identifierseq_001
v_callstringV gene assignment(s)IGHV1-2*01
d_callstringD gene assignment(s)IGHD3-3*01
j_callstringJ gene assignment(s)IGHJ4*01
productivebooleanSequence productivity statusTrue
sequencestringInput sequence (preserved)CAGGTG…

AIRR schema (--airr-format)

Additional columns

  • sequence_alignment — Aligned sequence
  • germline_alignment — Germline reference
  • v_sequence_start / v_sequence_end
  • d_sequence_start / d_sequence_end

Standardized fields

  • j_sequence_start / j_sequence_end
  • cdr3 / cdr3_aa
  • mutation_count
  • sequence_length

data types & formats

Gene calls

Single:
IGHV1-2*01
Multiple:
IGHV1-2*01,IGHV1-3*01

Coordinates

1-indexed positions:
v_start: 1, v_end: 285

Sequences

Uppercase nucleotides:
CAGGTGCAGCTG…
// quick reference

Quick reference card

Minimal command

python app.py run --model-checkpoint=… --chain-type=… --sequences=… --save-path=…

High stringency

--v-allele-threshold=0.9 --j-allele-threshold=0.9 --v-cap=1 --j-cap=1

Large datasets

--batch-size=4096 --fix-orientation

Full output

--airr-format --save-predict-object
© 2025 AlignAIR. All rights reserved.Advancing computational biology through AI