Usage

AlignAIR can be easily used through its Docker container interface, offering flexibility and speed for sequence alignment tasks.

Basic Usage Example

After starting the AlignAIR Docker container, run the following command inside it:

python app.py run --model-checkpoint=/app/checkpoints/IGH_S5F_576 --save-path=/data/output --chain-type=heavy --sequences=/app/tests/sample_HeavyChain_dataset.csv

Modify the parameters as needed to match your input and model requirements.

Parameters Overview

Below is a detailed description of all parameters supported by AlignAIR in CLI mode.

ParameterDescription
Model Settings
model_checkpointPath to model weights. Docker ships with IGH_S5F_576 and IGL_S5F_576 located in /app/pretrained_models/.
chain_typeSpecify heavy or light chain for alignment functionality.
max_input_sizeMaximum input window size. Default is 576. Longer reads are trimmed during preprocessing if needed.
batch_sizeNumber of sequences per batch (default: 2048). Larger values can improve runtime if resources allow.
Input and Output
sequencesPath to sequence file (CSV/TSV/FASTA). For tables, must have a sequence column.
save_pathPath to save output (AIRR Schema CSV format).
airr_formatSet True to output full AIRR Schema. Default is essential columns only.
Threshold Settings
v_allele_thresholdThreshold for V allele calling (default: 0.75). [See thresholding explanation]
d_allele_thresholdThreshold for D allele calling (default: 0.3).
j_allele_thresholdThreshold for J allele calling (default: 0.8).
v_cap / d_cap / j_capMaximum number of calls allowed for V/D/J alleles (default: 3).
Preprocessing and Corrections
translate_to_ascOutput ASC alleles instead of IMGT names if set True. Default is False.
fix_orientationAutomatically corrects reverse/complement/reverse-complement orientations before alignment.
Reference Metadata and Configs
heavy_data_config / kappa_data_config / lambda_data_configPaths to DataConfig pickle files. Default shipped models set to "D". Required for custom models.
custom_orientation_pipeline_pathPath to custom orientation model pickle. Leave empty for default models.
custom_genotypePath to YAML file defining genotype (V/D/J allele subset to use).
finetuned_model_params_yamlYAML specifying updated model parameters if fine-tuning has been performed.
Debugging Options
save_predict_objectSave the internal PredictObject containing raw predictions and intermediate states.