The Description stage includes modules to annotate and describe consensus sequence(s) and haplotypes. Use -h after any command for a list of options.
pairwise_align
Apply correct coordinate system to final sequence(s) to facilitate downstream analyses. Input is the final sequence file in FASTA format, a reference sequence in FASTA format, and a reference GFT file. Output is a JSON file to be used in extract_pairwise.
Usage:
haphpipe pairwise_align [SETTINGS] --amplicons_fa <FASTA> --ref_fa <FASTA> --ref_gtf <GTF> [--outdir]
(or):
hp_pairwise_align [SETTINGS] --amplicons_fa <FASTA> --ref_fa <FASTA> --ref_gtf <GTF> [--outdir]
Output files:
pairwise_aligned.json
Input/Output Arguments:
Option | Description |
---|---|
--amplicons_fa | Fasta file with assembled amplicons. |
--ref_fa | Reference fasta file. |
--ref_gtf | GTF format file containing amplicon regions. Primary and alternate coding regions should be provided in the attribute field (for amino acid alignment). |
--outdir | Output directory (default: False). |
Settings:
Option | Description |
---|---|
--keep_tmp | Keep temporary directory (default: False). |
--quiet | Do not write output to console (silence stdout and stderr) (default: False). |
--logfile | Append console output to this file. |
--debug | Print commands but do not run (default: False). |
Example usage:
haphpipe pairwise_align --amplicons_fa final.fna --ref_fa HIV_B.K03455.HXB2.fasta --ref_gtf HIV_B.K03455.HXB2.gtf
extract_pairwise
Extract sequence regions from the pairwise alignment produced in pairwise_align. Input is the JSON file from pairwise_align. Output is either an unaligned nucleotide FASTA file, an aligned nucleotide FASTA file, an amino acid FASTA file, an amplicon GTF file, or a tab-separated values (TSV) file (default: nucleotide FASTA with regions of interest from GTF file used in pairwise_align).
Usage:
haphpipe extract_pairwise [OPTIONS] [SETTINGS] --align_json <JSON> [--outdir]
(or):
hp_extract_pairwise [OPTIONS] [SETTINGS] --align_json <JSON> [--outdir]
Output files:
stdout.fasta
Input/Output Arguments:
Option | Description |
---|---|
--align_json | JSON file describing alignment (output of pairwise_align module). |
--outfile | Output file (default: stdout). |
Options:
Option | Description |
---|---|
--outfmt | Format for output: nuc_fa, aln_fa, amp_gtf, ost, or prot_fa (default: nuc_fa). |
--refreg | Reference region. String format is ref:start-stop. For example, the region string to extract pol when aligned to HXB2 is HIV_B.K03455.HXB2:2085-5096. |
Settings:
Option | Description |
---|---|
--debug | Print commands but do not run (default: False). |
Example usage:
haphpipe extract_pairwise --align_json pairwise_aligned.json --refreg HIV_B.K03455.HXB2:2085-5096
summary_stats
Report summary statistics from an alignment and/or haplotype calling as TXT and TSV files. Input is a list of paths to directories (TXT format, one per line), each of which contain the following files: final_bt2.out
, trimmomatic_summary.out
, final.bam
, final.fna
, and final.vcf.gz
.
If applicable, also input a list of directories containing PredictHaplo summary files (ph_summary.txt
). If amplicons were used in assembly, use the --amplicons
option to report statistics per amplicon.
Usage:
haphpipe summary_stats [SETTINGS] --dir_list <TXT> [--ph_list <TXT>] [--amplicons] [--outdir]
(or):
hp_summary_stats [SETTINGS] --dir_list <TXT> [--ph_list <TXT>] [--amplicons] [--outdir]
Output files:
summary_stats.txt, summary_stats.tsv, PH_summary_stats.tsv
Input/Output Arguments:
Option | Description |
---|---|
--dir_list | List of directories which include the required files, one on each line. |
--ph_list | List of directories which include haplotype summary files, one on each line. |
--amplicons | Amplicons used in assembly (default: False). |
Settings:
Option | Description |
---|---|
--quiet | Do not write output to console (silence stdout and stderr) (default: False). |
--logfile | Name for log file. |
--debug | Print commands but do not run (default: False). |
Example usage:
haphpipe summary_stats --dir_list demo_sra_list.txt --ph_list demo_sra_ph_list.txt --amplicons
annotate_ from_ref
Annotate consensus sequence from reference annotation. Input is JSON file from pairwise_align and reference GTF file.
Usage:
haphpipe annotate_from_ref [OPTIONS] [SETTINGS] --haplotypes_fa <best.fas> [--outdir]
(or):
hp_annotate_from_ref [OPTIONS] [SETTINGS] --haplotypes_fa <best.fas> [--outdir]
Output files:
Input/Output Arguments
Settings:
Example usage: (add)