Documentation

API Reference

Complete reference for AlignAIR command-line interface, parameters, input formats, and output schemas.

📖 Quick Navigation

CLI Interface Parameters Input Formats Output Schema

Command Line Interface

Basic Syntax

python app.py [COMMAND] [OPTIONS]

Available Commands

runPrimary

Execute AlignAIR sequence analysis with specified parameters

--helpUtility

Display help information and parameter list

--versionUtility

Show AlignAIR version information

Exit Codes

0Success

1General Error

2Invalid Arguments

3File Not Found

Complete Parameters Reference

Required Parameters

Parameter	Type	Description	Example
`--model-checkpoint`	string	Path to trained model weights	`/app/pretrained_models/IGH_S5F_576`
`--chain-type`	choice	heavy \| light	`heavy`
`--sequences`	string	Path to input sequence file	`/data/input/sequences.csv`
`--save-path`	string	Output directory path	`/data/output`

Optional Parameters

Model Configuration

--max-input-sizeint (default: 576)Maximum sequence length

--batch-sizeint (default: 2048)Processing batch size

Threshold Settings

--v-allele-thresholdfloat (default: 0.75)V gene threshold

--d-allele-thresholdfloat (default: 0.3)D gene threshold

--j-allele-thresholdfloat (default: 0.8)J gene threshold

--v-cap / --d-cap / --j-capint (default: 3)Maximum calls per gene

Processing Options

--airr-formatflagOutput full AIRR schema

--fix-orientationflagAuto-correct sequence orientation

--translate-to-ascflagUse ASC allele names

--save-predict-objectflagSave raw predictions for debugging

Input Formats

CSV Format

Required Column:

sequence

Example:

sequence_id,sequence seq_001,CAGGTGCAGCTG... seq_002,GAGGTGCAGCTG...

TSV Format

Delimiter:

Tab-separated

Example:

sequence_id sequence seq_001 CAGGTGCAGCTG... seq_002 GAGGTGCAGCTG...

FASTA Format

Standard Format:

>header

Example:

>seq_001 CAGGTGCAGCTGGTGGAG... >seq_002 GAGGTGCAGCTGGTGGAG...

Input Validation

✅ Valid Sequences

• DNA nucleotides: A, T, G, C
• IUPAC ambiguous codes: N, R, Y, etc.
• Minimum length: 50 nucleotides
• Maximum length: Auto-trimmed to max-input-size

❌ Invalid Sequences

• Protein sequences (amino acids)
• Empty or whitespace-only sequences
• Sequences with invalid characters
• Extremely short sequences (less than 50 nt)

Output Schema

Standard Output Columns

sequence_id

string

Unique sequence identifier

seq_001

v_call

string

V gene assignment(s)

IGHV1-2*01

d_call

string

D gene assignment(s)

IGHD3-3*01

j_call

string

J gene assignment(s)

IGHJ4*01

productive

boolean

Sequence productivity status

True

sequence

string

Input sequence (preserved)

CAGGTG...

AIRR Schema Output (--airr-format)

Additional Columns

• sequence_alignment - Aligned sequence
• germline_alignment - Germline reference
• v_sequence_start - V region start position
• v_sequence_end - V region end position
• d_sequence_start - D region start position
• d_sequence_end - D region end position

Standardized Fields

• j_sequence_start - J region start position
• j_sequence_end - J region end position
• cdr3 - CDR3 sequence
• cdr3_aa - CDR3 amino acid sequence
• mutation_count - Number of mutations
• sequence_length - Total sequence length

Data Types & Formats

Gene Calls

Single call:

IGHV1-2*01

Multiple calls:

IGHV1-2*01,IGHV1-3*01

Coordinates

1-indexed positions:

v_start: 1, v_end: 285

Sequences

Uppercase nucleotides:

CAGGTGCAGCTG...

📋 Quick Reference Card

Minimal Command

python app.py run --model-checkpoint=... --chain-type=... --sequences=... --save-path=...

High Stringency

--v-allele-threshold=0.9 --j-allele-threshold=0.9 --v-cap=1 --j-cap=1

Large Datasets

--batch-size=4096 --fix-orientation

Full Output

--airr-format --save-predict-object

View More Examples

Mutation Handling

Star on GitHub Report Issue

FAQ & Help

Terms•License•Open Source