Usage
AlignAIR can be easily used through its Docker container interface, offering flexibility and speed for sequence alignment tasks.
Basic Usage Example
After starting the AlignAIR Docker container, run the following command inside it:
python app.py run --model-checkpoint=/app/checkpoints/IGH_S5F_576 --save-path=/data/output --chain-type=heavy --sequences=/app/tests/sample_HeavyChain_dataset.csv
Modify the parameters as needed to match your input and model requirements.
Parameters Overview
Below is a detailed description of all parameters supported by AlignAIR in CLI mode.
Parameter | Description |
---|---|
Model Settings | |
model_checkpoint | Path to model weights. Docker ships with IGH_S5F_576 and IGL_S5F_576 located in /app/pretrained_models/ . |
chain_type | Specify heavy or light chain for alignment functionality. |
max_input_size | Maximum input window size. Default is 576 . Longer reads are trimmed during preprocessing if needed. |
batch_size | Number of sequences per batch (default: 2048). Larger values can improve runtime if resources allow. |
Input and Output | |
sequences | Path to sequence file (CSV/TSV/FASTA). For tables, must have a sequence column. |
save_path | Path to save output (AIRR Schema CSV format). |
airr_format | Set True to output full AIRR Schema. Default is essential columns only. |
Threshold Settings | |
v_allele_threshold | Threshold for V allele calling (default: 0.75). [See thresholding explanation] |
d_allele_threshold | Threshold for D allele calling (default: 0.3). |
j_allele_threshold | Threshold for J allele calling (default: 0.8). |
v_cap / d_cap / j_cap | Maximum number of calls allowed for V/D/J alleles (default: 3). |
Preprocessing and Corrections | |
translate_to_asc | Output ASC alleles instead of IMGT names if set True . Default is False . |
fix_orientation | Automatically corrects reverse/complement/reverse-complement orientations before alignment. |
Reference Metadata and Configs | |
heavy_data_config / kappa_data_config / lambda_data_config | Paths to DataConfig pickle files. Default shipped models set to "D". Required for custom models. |
custom_orientation_pipeline_path | Path to custom orientation model pickle. Leave empty for default models. |
custom_genotype | Path to YAML file defining genotype (V/D/J allele subset to use). |
finetuned_model_params_yaml | YAML specifying updated model parameters if fine-tuning has been performed. |
Debugging Options | |
save_predict_object | Save the internal PredictObject containing raw predictions and intermediate states. |