AlignAIR Logo
Documentation
DocumentationAPI Reference

API Reference

Complete reference for AlignAIR command-line interface, parameters, input formats, and output schemas.

Command Line Interface

Basic Syntax

python app.py [COMMAND] [OPTIONS]

Available Commands

runPrimary

Execute AlignAIR sequence analysis with specified parameters

--helpUtility

Display help information and parameter list

--versionUtility

Show AlignAIR version information

Exit Codes

0Success
1General Error
2Invalid Arguments
3File Not Found

Complete Parameters Reference

Required Parameters

ParameterTypeDescriptionExample
--model-checkpointstringPath to trained model weights/app/pretrained_models/IGH_S5F_576
--chain-typechoiceheavy | lightheavy
--sequencesstringPath to input sequence file/data/input/sequences.csv
--save-pathstringOutput directory path/data/output

Optional Parameters

Model Configuration

--max-input-sizeint (default: 576)Maximum sequence length
--batch-sizeint (default: 2048)Processing batch size

Threshold Settings

--v-allele-thresholdfloat (default: 0.75)V gene threshold
--d-allele-thresholdfloat (default: 0.3)D gene threshold
--j-allele-thresholdfloat (default: 0.8)J gene threshold
--v-cap / --d-cap / --j-capint (default: 3)Maximum calls per gene

Processing Options

--airr-formatflagOutput full AIRR schema
--fix-orientationflagAuto-correct sequence orientation
--translate-to-ascflagUse ASC allele names
--save-predict-objectflagSave raw predictions for debugging

Input Formats

CSV Format

Required Column:
sequence
Example:
sequence_id,sequence seq_001,CAGGTGCAGCTG... seq_002,GAGGTGCAGCTG...

TSV Format

Delimiter:
Tab-separated
Example:
sequence_id sequence seq_001 CAGGTGCAGCTG... seq_002 GAGGTGCAGCTG...

FASTA Format

Standard Format:
>header
Example:
>seq_001 CAGGTGCAGCTGGTGGAG... >seq_002 GAGGTGCAGCTGGTGGAG...

Input Validation

✅ Valid Sequences

  • • DNA nucleotides: A, T, G, C
  • • IUPAC ambiguous codes: N, R, Y, etc.
  • • Minimum length: 50 nucleotides
  • • Maximum length: Auto-trimmed to max-input-size

❌ Invalid Sequences

  • • Protein sequences (amino acids)
  • • Empty or whitespace-only sequences
  • • Sequences with invalid characters
  • • Extremely short sequences (less than 50 nt)

Output Schema

Standard Output Columns

sequence_id
string
Unique sequence identifier
seq_001
v_call
string
V gene assignment(s)
IGHV1-2*01
d_call
string
D gene assignment(s)
IGHD3-3*01
j_call
string
J gene assignment(s)
IGHJ4*01
productive
boolean
Sequence productivity status
True
sequence
string
Input sequence (preserved)
CAGGTG...

AIRR Schema Output (--airr-format)

Additional Columns

  • sequence_alignment - Aligned sequence
  • germline_alignment - Germline reference
  • v_sequence_start - V region start position
  • v_sequence_end - V region end position
  • d_sequence_start - D region start position
  • d_sequence_end - D region end position

Standardized Fields

  • j_sequence_start - J region start position
  • j_sequence_end - J region end position
  • cdr3 - CDR3 sequence
  • cdr3_aa - CDR3 amino acid sequence
  • mutation_count - Number of mutations
  • sequence_length - Total sequence length

Data Types & Formats

Gene Calls

Single call:
IGHV1-2*01
Multiple calls:
IGHV1-2*01,IGHV1-3*01

Coordinates

1-indexed positions:
v_start: 1, v_end: 285

Sequences

Uppercase nucleotides:
CAGGTGCAGCTG...

📋 Quick Reference Card

Minimal Command

python app.py run --model-checkpoint=... --chain-type=... --sequences=... --save-path=...

High Stringency

--v-allele-threshold=0.9 --j-allele-threshold=0.9 --v-cap=1 --j-cap=1

Large Datasets

--batch-size=4096 --fix-orientation

Full Output

--airr-format --save-predict-object
© 2025 AlignAIR. All rights reserved.Advancing computational biology through AI